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“ In ancient times, Statistics was regarded only as a science of statecraft апі) 
was used to collect information relating to crimes, military strength, popula- 
tion, wealth, etc. for devising military and fiscal policies. But, today, 
Statistics is not merely a by-product of the administrative set-up of the state 
but it embraces all sciences—social, physical and natural, and is finding 
numerous applications in various diverse fields such as agriculture, industry, 
sociology, biometry, planning, economics, business, management, insur- 
ance, accountancy and auditing, and so on. Statistics (theory and methods) is 
used extensively by the government or business or management organisa- 
tions in planning future programmes and formulating policy decisions. It is 
rather impossible to think of any sphere of human activity where Statistics 
does not creep in. The subject of Statistics has acquired tremendous progress 
in the recent past so much so that an elementary knowledge of statistical 
methods has become a part of the general education in the curricula of many 
academic and professional courses. 


PREFACE 


This book is a modest though determined bid to serve as a text-book for 
B.Com. (Pass and Hons.) and B.A. (Economics Hons.) Courses of Indian 
Universities. The main aim in writing this book is to present a clear, sim- 
ple, systematic and comprehensive exposition of the principles, methods and 
techniques of Statistics in various disciplines, with special reference to 
Economics and Business. The stress is on the application of techniques and 
methods most commonly used by statisticians. The lucidity of style and 
simplicity of expression have been our twin objectives in preparing this 
text. Matheniéftical complexity has been avoided as far as possible. Wherever 
desirable, the notations and terminology have been clearly explained and then 
all the mathematical steps have been explained in detail. 


An attempt has been made to start with the explanation of the elementaries 
of a topic and then the complexities and intricacies of the advanced problems 
have been explained and solved in a lucid manner. A number of typical prob- 
lems mostly from various university examination papers have been solved as 
illustrations so as to expose the students to different techniques of tackling 
the problems and enable them to have a better and thoughtful understanding 
of the basic concepts of theory and its various applications. At many places, 
explanatory remarks have been given to widen the readers horizon. Moreover, 
in order to enable the readers to have a proper appreciation of the subject- 
matter and to fortify their confidence in the understanding and application of, 
methods, a large number of carefully-graded problems, mostly drawn from 
various university examination papers, have been given as exercise sets in 
each chapter, Answers to the problems in the exercise sets are given at the 
end of each problem. 


Keeping in mind the present examination trend in various universities, 
Objective Type Questions have been given at the end of each chapter. This 
will enable the readers to have a clear understanding and grasp of the basic 
concepts of the subiect. 


fhe book contains 16 Chapters. We will not enumerate the topics dis 


Assed in the text, since 


an idea of these can be obtained from a curs» 


lance at the table of contents. Chapters 1 to 11 are devoted to Descripti. 1) 


{Statistics which consists 


in describing some characteristics like averages, 


/ dispersion, skewness, kurtosis, correlation, etc. of the numerical data. In ee. 
Spite of many latest developments in statistical techniques, old topics like ^ 
Classification and Tabulation (Chapter 3) and Diagrammatic and Graphic 


Representation (Chapter 


4) have been discussed in detail, since they stili 


constitute the bulk of statistical work in government and business organisa- 
tions. The use of statistical methods as scientific tools in the analysis of 


economic and business 


data has been explained in Chapter 10 (Index 


Numbers) and Chapter 11 (Time Series Analysis). Chapters 12 to 14 relate 
to advanced topics like Probability, Random Variable, Mathematical 
Expectation and Theoretical Distributions. An attempt has been made to 
give a detailed discussion of these topics on modern lines through the con- 
cepts of Sample Space and Axiomatic Approach in a very simple and lucid 
manner. Chapter 15 (Sampling and Design of Sample Surveys) explains the 
various techniques of planning and executing statistical enquiries so as to 
arrive at valid conclusions about the population. Chapter 16 (Interpolation 


and Extrapolation) deals 
function y = f (x) for any 


with the techniques of estimating the value of a 
given intermediate value of the variable x. 


We must unreservedly acknowledge our deep debt of gratitude to the 
numerous authors whose great and masterly works we have consulted during 
the preparation of the manuscript. 


We take this opportunity to express our sincere gratitude to Prof. Kanwar 
Sen, Shri V.K. Kapoor and a number of our students for their Valuable help 
and suggestions in the preparation of this book. 


Last but not the least, we express our deep sense of gratitude to our pub- 
lishers M/s. Himalaya Publishing House, their untiring efforts and unfail- 
ing courtesy and co-operation in bringing out the book in time in such an 


elegant form. 


Every effort has been made to avoid printing errors, though some might 
have crept in inadvertently. We shall be obliged if any such errors are 
brought to our notice. Valuable suggestions and criticisms for the improve- 
ment of the book from our colleagues (who are teaching this course) and 
students will be highly appreciated and duly incorporated in subsequent 


editions. 


June, 1988 


S.C. Gupta 
Mrs. Indra Gupta 


CONTENTS 


1, INTRODUCTION — MEANING AND SCOPE 
1.4 Origin and Development of Statistics 

12 Definition of Statistics 

13 Importance and Scope of Statistics 

14 Limitations of Statistics 

15 Distrust of Statistics 


2. COLLECTION OF DATA 

2.1 Introduction 

2.1.1 Objectives and Scope of the Enquiry 

2.1.2 Statistical Units to be Used 

2.1.3. Sources of Information (Data) 

2.1.4 Methods of Data Collection 

2.1.5 Degree of Accuracy Aimed at in the Final Results 
2.1.6 Type of Enquiry 

2.2 Primary and Secondary Data 

2.2.1 Choice between Primary and Secondary Data 
23 Methods of Collecting Primary Data 

2.3.1 Direct Personal Investigation . 

2.3.2 Indirect Oral Investigation 

2.3.3 Information Received Through Local Agencies 
2.3.4 Mailed Questionnaire Method 

23.5 Schedules Sent Through Enumerators 

24  Drafting or Framing the Questionnaire 

2.5 Sources of Secondary Data 

2.5.1 Published Sources 

2.5.2 Unpublished Sources 

2.6 Precautions in the Use of Secondary Data 


3. CLASSIFICATION AND TABULATION 

3.1 Introduction 

3.2 Classification 

3.2.1 Functions of Classification 

3.2.2 Rules of Classification 

3.2.3 Bases of Classification 

33 Frequency Distribution 

33.1 Array 

3.3.2 Discrete or Ungrouped Frequency Distribution 

3.3.3 Grouped Frequency Distribution 

3.3.4 Continuous Frequency Distribution 

34 Basic Principles for Forming a Grouped 
Frequency Distribution 

3.4.1 Types of Classes 

3.4.2 Number of Classes 

3.4.3. Size of Class Intervals 

3.4.4 Types of Class Intervals 


59-117 
59 
60 
61 
61 
63 
67 
68 
68 
69 
70 
70 


71 
71 
73 
74 


(х) 
3.5 Cumulative Frequency Distribution 
3.5.1 Less Than Cumulative Frequency 

3.5.2 More Than Cumulative Frequency 

3.6  Bivariate Frequency Distribution 

3.7 Tabulation—Meaning and Importance 
3.7.1 Parts of a Table 

3.7.2 Requisites of a Good Table 

3.7.3 Types of Tabulation 


4. DIAGRAMMATIC AND GRAPHIC REPRESENTATION 


4.1 [Introduction 
42 Difference between Diagrams and Graphs 
43  Diagrammatic Representation 
43.1 General Rules for Constructing Diagrams 
4.3.2 Types of Diagrams 
4.3.3 One Dimensional Diagrams 

Two Dimensional Diagrams 
4.34 Three Dimensional Diagrams 
1 Bene 
4.3.7 Choice of Diagram 
44 Graphical Representation of Data 
44.1 Technique of Construction of Graphs 
4.42 General Rules for Graphing 
44.3 Graphs of Frequency Distributors 
4.44 Graphs of Time Series or Historigrams 
44.5 Semi-Logarithmic Line Graphs or Ratio Charts 
45 Limitations of Diagrams and Graphs 


5. AVERAGES 
5.1 Introduction 
52 Requisites of a Good Average or 
Measure of Central Te 
5.3 Various Measures of Central Tendency 
5.4 Arithmetic Mean 
5.4.1 Step Deviation Method for Computing Arithmetic Mean 
5.42 Mathematical Properties of Arithmetic Mean 
5.5 Weighted Arithmetic Mean 
5.6 Median 
5.6.1 Partition Values 
5.6.2 Graphic Method of Locating Partition Values 
57 Mode 
5.7.1 Computation of Mode 
5.7.2 Merits and Demerits of Mode 
5.7.3 Graphic Location of Mode 


58 Баса Relation between Mean (M), Median 
(Md) and Mode (Mo) 


86 4 
87 
88 
90 
98 
99 
102 
103 


118-214 
118 


59 

5.9.1 
59.2 
593 


594 
5.10 


(xi) 


Geometric Mean 

Merits and Demerits of Geometric Mean 
Compound Interest Formula 

Average Rate of Variable which Increases by 
Different Rates at Different Periods 
Weighted Geometric Mean 

Harmonic Mean 


5.10.1 Merits and Demerits of Harmonic Mean 
5.10.2 Weighted Harmonic Mean 


5.11 


5.12 
5.13 


Relations between Arithmetic Mean, Geometric 
Mean and Harmonic Mean 

Selection of an Average 

Limitations of Averages 


6. DISPERSION 


6.1 
6.2 
6.3 
64 
6.5 
6.5.1 
6.5.2 
6.6 
6.6.1 
6.7 
6.8 
6.8.1 
6.8.2 
6.8.3 
6.8.4 
6.8.5 
6.9 
6.9.1 
6.9.2 
6.9.3 
6.10 
6.11 
6.12 
6.13 


Introduction and Meaning 

Characteristics for an Ideal Measure of Dispersion 
Absolute and Relative Measures of Dispersion 
Measures of Dispersion 

Range 

Merits and Demerits of Range 

Uses 

Quartile Deviation or Semi-Inter-Quartile Range 
Merits and Demerits of Quartile Deviation 
Percentile Range 

Mean Deviation or Average Deviation 
Computation of Mean Deviation 

Short-cut Method of Computing Mean Deviation 
Merits and Demerits of Mean Deviation 

Uses 

Relative Measures of Mean Deviation 

Standard Deviation 

Merits and Demerits of Standard Deviation 
Variance and Mean Square Deviation 

Different Formulae for Calculating Variance 
Standard Deviation of the Combined Series 
Coefficient of Variation 

Relation between Various Measures of Dispersion 
Lorenz Curve 


7. SKEWNESS AND KURTOSIS 


191 
712 
72.1 
73 


Introduction 
Skewness 

Measures of Skewness 
Moments 


286 
288 
289 


289 
294 


295 
296 
299 


300 
301 
301 


307-376 


377- 


307 
308 
308 
309 
309 
310 
310 
312 
313 
314 
318 
318 
320 
321 
322 
322 
329 
331 
331 
332 
348 
353 
361 
369 


417 
377 
377 
379 
400 


(xii) 


7.3.1 Moments About Mean 
7.3.2 Moments About Arbitrary Point 'A' 
73.3 Relation between Moments About Mean and 
Moments About Arbitrary Point 'A' 
7.34 Sheppard's Correction for Moments 
73.5 Charlier Checks 
74 Karl Pearson's Beta (b) and Gamma (y) 
1 Coefficients Based on Moments 
7.5 Coefficient of Skewness Based on Moments 
7.6  Kurtosis 
8. CORRELATION 
8.1 Introduction 
8.11 Types of Correlation 
8.1.2 Correlation and Causation 
82 Methods of. Studying Correlation 
83 Scatter Diagram Method 
84 Karl Pearson's Coefficient of Correlation 
(Covariance Method) 
84.1 Properties of. Correlation Coefficient 
842 Assumptions Underlying Karl Pearson's 
Correlation Coefficient 
8.4.3 Interpretation of F 
8.5 Probable Error 
8.6 — Correlation in Bivariate Frequency Table 
8.7 Rank Correlation Method 
8.71 Limits forr 
8.7.2 Computation of Rank Correlation Coefficient (r) 
8.7.3 Remarks on Spearman's Rank Correlation Coefficient 
8.8 Method of Concurrent Deviations 
8.9 Coefficient of Determination 
9. LINEAR REGRESSION ANALYSIS 
9.1 Introduction 
9.2 Linear and Non-Linear Regression 
9.3 Lines of Regression 
9.3.1 Derivation of Line of Regression of Y on X 
9.32 Lineof Regression of X on Y - 
9.3.3 Angle between Һе Regression Lines 
9.4 Coefficient of Regression 
9.4.1 Theorems on Regression Coefficients 
9.5 To Find the Mean Values ( x, y ) from the 
Two Lines of Regression 
9.6 To Find the Regression Coefficients and the Correlation 


Coefficient from the Two Lines of Regression 


418-477 


418 
419 
421 
422 
422 
427 


434 
440 


441 
442 
453 
461 
461 
462 
467 
471 
475 


478-517 
478 
479 
480 
480 
482 
485 
486 
488 
502 


508 


In troduction—Meaning € Scope 
тозе a 


1.1. Origin and Development of Statistics. The subject of 
Statistics, as it Seems, is not a new discipline but it is as old as the 
human Society itself. Те has been: used right from the existence of 


life on this earth, although the sphere of its utility was very much 


word ‘statistik’ or the French word ‘statistique’, each of which 
means a political state. In the ancient times the scope of Statistics 
was primarily limited to the collection of the following data by the 
governments for framing military and fiscal policies : 

(5) Age and sex-wise population of the country ; 

(ti) Property and wealth of the country ; 


the former enabling the government to have an idea of the man- 
power of the country {in order to safeguard itself against any outside 
aggression) and the latter providing it with information for the 
introduction of new taxes and levies, 


middle ages. In India, an efficient system of collecting official and 
administrative statistics existed even 2000 years ago—in particular 


Surveys conducted during Akbar’s reign is available in the book 
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In Germany, the systematic collection of official statistics 
originated towards the end of the 18th century when, in order to 
have an idea of the relative strength of different German States, 
information regarding population and output—industrial and agri- 
cultural—was collected. In England, statistics were the outcome of 
Napoleonic wars. The wars necessitated the systematic collection 
of numerical data to enable the government to assess the revenues 
and expenditure with greater precision and then to levy new taxes 
in order to meet the cost of war. 


Sixteenth century saw the applications of Statistics for the 
collection of the data relating to the movements of heavenly bodies— 
stars and planets—to know about their position and for the predic- 
tion of Eclipses, J. Kepler made a detailed study of the information 
collected by Tycko Brave (1554-1601) regarding the movements of 
the planets and formulated his famous three laws relating to the 
movements of heavenly bodies. These laws paved the way for the 
discovery of Newton’s law of gravitation. 


Seventeenth century witnessed the origin of Vital Statistics. 
Captain John Graunt of London (1620—1674), known as the Father 
of Vital Statistics, was the first man to make a systematic study of 
the birth and death statistics. Important contributions in this field 
were also made by prominent persons like Casper Newman (in 
1691), Sir William Petty (1623—1687), James Dodson, Thomas 
Simpson and Dr. Price. The computation of mortality tables and 
the calculation of expectation of life at different ages by these per- 
sons led to the idea of ‘Life Insurance’ and Life Insurance Institution 
was founded in London in 1698. William Petty wrote the book 
“Essay on Political Arithmetic’. In those days Statistics was regarded 
as Political Arithmetic. This concept of Statistics as Political Arith- 
metic continued even in early 18th century when J. P. Sussmilch 
(1707—1767), a Prussian Clergyman, formulated his doctrine that 
the ratio of births and deaths more or less remains constant and gave 
аен explanaticn to the theory of ‘Natural Order оў Physiocratic 

chool’, 


The backbone of the so-called modern theory of Statistics is 
the ‘Theory of Probability’ or the ‘Theory of Games and Chance’ 
which was developed in the mid-seventeenth certury. Theory of 
probability is the outcome of the prevalence of gambling among the 
nobles of England and France while estimating the chances of 
winning or losing inthe gamble, the chief contributors being the 
mathematicians and gamblers of France, Germany and England. 
Two French mathematicians Pascal (1623—1662) and Р. Fermat 
(1601—1665) after a lengthy correspondence between themselves 
ultimately succeeded in solving the famous ‘Problem of Points’ posed 
by the French gambler Chevalier de-Mere and this correspondence 
laid the foundation stone of the science of probability. Next stal- 
wart in this field was J. Bernoulli (1654—1705) whose great treatise 
on probability ‘Ars Conjectandi’ was published posthumously in 
1713, eight years after his death by his nephew Daniel Bernoulli 
(1700—1782). This contained the famous ‘Law of Large Numbers’ 


| 
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Least Squares and established the ‘Normal Law of Errors? indepen- 
dently of De-Moivre ; L.A. J. Quetlet (1798—1874) discovered the 
principle of “Constancy of Great Numbers’ which forms the basis of 
sampling ; Euler, Lagrange, Bayes etc. Russian mathematicians also 
have made very outstanding contributions to the modern theory of 
probability, the main} contributors to mention only a few of them 
are : Chebychev (1821—1894), who founded the Russian School of 
Statisticians ; A. Markov (Markov Chains) ; Liapounoff (Central 
Limit Theorem); A Khinchine (Law of Large Numbers) А 
Kolmogorov (who axiomised the calculus of probability) ; Smirnov, 


t-test ushered in an era of exact (small) sample tests, Pehaps most 


‘ments, His contributions to the subject of Statistics are described by 
One writer in the following words : 


“Е.А. Fisher is the real giant in the development of the theory of 
Statistics.” 


It is only the varied and outstanding contributions of R, A, 


Fisher that put the subject of Statistics on a very firm footing and 
earned for it the status of a full fledged science, 


1.2. Definition of Statistics. Statistics has been defined 
differently by different writers from time to time so much so that 
scholarly articles have collected together hundreds of definitivas, 
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4 emphasizing precisely the meaning, scope and limitations of the 
=: subject. The reasons for such a variety of definitions may be broadly 
- classified as follows : 


1 
(i) The field of utility of Statistics has been increasing steadily 
and thus different people defined it differently according to the 
developments of the subject. In old days, Statistics was regarded as 
the ‘science of statecraft’ but today it embraces almost every sphere 
of natural and human activity. Accordingly, the old definitions which 
were confined to a very limited and narrow field of enquiry were 
replaced by the new definitions which are more exhaustive and 
elaborate in approach. 


(ii) The word Statistics has been used to convey different 
meanings in singular and plural sense. When used as plural, statis- 
tics means numerical set of data and when used in singular sense it 
means the science of statistical methods embodying the theory and 
techniques used for collecting, analysing and drawing inferences 
from the numerical data, 


It is practicailly impossible to enumerate all the definitions 
given to Statistics both as ‘Numerical Data’ and ‘Statistical Methods" 
due to limitations of space. However, we give below some selected 
definitions. 


WHAT THEY SAY ABOUT STATISTICS— E 
SOME DEFINITIONS 
"STATISTICS AS NUMERICAL DATA" 

3. “Statistics are the classified facts representing the conditions of 
the people in a State...specially those facts which can be stated 
in number or $m tables of numbers or in any tabular or classi- 

.. fied arrangement. —Webster. 

2. “Statistics are numerical statement of facts in any department 
of enquiry placed in relation to each other." —Bowely. 

3. “By Statistics we mean quantitative data affected to a marked 
extent by multiplicity of causes”.—Yule and Kendall. 

4. “Statistics may be defined as the aggregate of facta affected to a 
marked extent by multiplicity of causes, numerically expressed, 
enumerated or estimated -according to a reasonable standard 
of accuracy, collected in a systematic manner, for a predeter- | 
mined purpose and placed in relation to each other.” 

—Prof. Horace Secrist. 


_, Remarks and Comments. 1. According to Webster's defi- 
nition only numerical facts can be termed Statistics, More-over it 
restricts the domain of Statistics to the affairs of a State ie, to 
social sciences. This isa very old and narrow definition and is in- 
adequate for modern times since today Statistics embraces all 
Sciences—social, physical and natural. 
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2 Bowley's definition is more general than Webster's since it 
is related to numerical data in any department of enquiry. More- 
over it also provides for comparative study ofthe figures as against 
mere classification and tabulation of Webster's definition. : 


3. Yule and Kendall's definition refers to numerical data 
affected by a multiplicity of causes. This is usually the case in 
social, economic and business phenomenon. For example, the 
prices of a particular commodity are affected by a number of factors 
viz., supply, demand, imports, exports, money in circulation, compe- 
titive products in the market and so on. Similarly, the yield ofa 
particular crop depends upon multiplicity of factors like quality of 
seed, fertility of soil, method of cultivation, irrigation facilities, 
weather conditions, fertilizer used and so on. 


4. Secrist’s definition seems to be the most exhaustive of all 
the four. Let us try to examine it in details. 


(i) Aggregate of Facts. Simple or isolated items cannot be 
termed as Statistics unless they are a part of aggregate of facts rela- 
ting to any particular field of enquiry. For instance, the height of 
an individual or the price of a particular commodity do not form_ 
Statistics as such figures are unrelated and uncomparable. However, 
aggregate of the figures of births, deaths, sales, purchase, produc- 
tion, profits ete., over different times, places, etc., will constitute 
Statistics. 


(ii) Affected by Multiplicity of Causes. Numerical figures 
should be affected by multiplicity of factors. This point has already 
been elaborated in remark 3 above. In physical sciences, it is possi- 
ble to isolate the effect of various factors on a single item but it is 
very difficult to do so in social sciences, particularly when the effect 
of some of the factors cannot be measured quantitatively. However, 
. Statistical techniques have been devised to study the joint effect of a 

‚ number of a factors on a single item (Multiple Correlation) or the 
isolated effect of a single factor on the given item (Partial Correla- 
tion) provided the effect of each of the factors can be measured 
quantitatively, 

liii) Numerically Expressed. Only numerical data consti- 
tute Statistics. Thus the statements like ‘the standard of living of 
the people in Delhi has improved’ or ‘the production of a particu- 
lar commodity is increasing' do, not constitute Statistics. In 
particular, the qualitative characteristics which cannot be measur- 
ed quantitatively such as intelligence, beauty, honesty, etc, can not 
be termed as Statistics unless they are numerically expressed by 
assignning particular scores as quantitative standards. For example, 
intelligence is not Statistics but the intelligence quotients which 
may be interpreted as the quantitative measure of the intelligence of 
individuals could be regarded as Statistics. 


(iv) Enumerated or Estimated According to Reasonable 
Standard of Accuracy, The numerical data pertaining to any 
field of enquiry can be obtained be completely enumerating the 


6 Business Statistics 


underlying population. In such a case data will be exact and 
accurate (but for the errors of measurement, personal bias еїс.). 
However, if complete enumeration of the underlying population is 
not possible (e.g. if population is infinite, or if testing is destructive 
i.e. if the item is destroyed in the course of inspection just like in 
testing explosives, light bulb, etc.) and evenif possible it may not 
be practicable due to certain reasons (such as population being 
very large, high cost of enumeration per unit and our resources being 
limited in terms of time and money etc.,) then the data are estimat- 
ed by using the powerful techniques of Sampling and Estimation 
theory, However, the estimated values will not be as precise and 
accurate as the actual values. The degree of accuracy of the esti- 
mated values largely depends on the nature and purpose of the 
enquiry. For example, while measuring the heights of individuals 
accuracy will be aimed in terms of fractions of an inch whereas 
while measuring distance between two placesit may be in terms of 
metres aud if the places are very distant, e.g., say Delhi and London, 
the difference of few kilometres may be ignored. However, certain 
standards of accuracy must be maintained for drawing meaningful 
conclusions. 


(v) Collected in a Systematic Manner. The data must be 
collected in a very systematic manner, Thus, for any socio-economic 
survey, a proper schedule depending on the object of enquiry should 
be prepared and trained personnel (investigators) should be used to 
collect the data by interviewing the persons. An attempt should be 
made to reduce the personal bias to the minimum. Obviously, the 
data collected in a haphazard way will not conform to the reason- 
able standards of accuracy and the conclusions based on them might 
lead to wrong or misleading decisions. 


(vi) Collected for a Pre-determined Purpose. It is of 
utmost importance to define in clear and concrete terms the objec- 
tives or the purpose of the enquiry and the data should be collected 
keeping in view these objectives. An attempt should not be made 
to collect too many data some of which are never examined or 
analysed i.e., we should not waste time in collecting the information 
which is irrelevant for our enquiry. Alsoitshould be ensured that 
no essential data are omitted. For example, if the purpose of enquiry 
is to measure the cost of living index for low income group people, 
we should select only those commodities or items which are consu- 
med or utilised by persons belonging to this group. Thus for such 
an index, the collection of the data on the commodities like scooters, 
cars, refrigerators, television sets, high quality cosmetics etc., will 
be absolutely useless. 


(vii) Comparable. From practical point of view, for statis- 
tical analysis the data should be comparable. They may be com- 
pared with respect to some unit, generally: time (period) or place. 
For example, the data relating to the population of a country for 
different years or the population of different countries in some fixe 
year constitute Statistics, since they are comparable. However, the 
datarelating to the size of the shoe of an individual and his 


М 
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intelligence quotient (LQ.) do not constitute Statistics as they are 
not comparable. In order to make valid comparisons the data should 
be homogencous i.e,, they should relate to the same phenomenon or 
subject, 


5 From the definition of Horace Secrist and its discussion in 
remark 4 above, we may conclude that : 


“All Statistics are numerical statements of facts but all numeri- 
cal statements of facts are not Statistics", 


6. We give below the definitions of Statistics used in singular 
sense i.e., Statistics as Statistical Methods. 
WHAT THEY SAY ABOUT STATISTIOS— 
SOME DEFiNITIONS 
"STATISTICS AS STATISTICAL METHODS" 
l. Statistics may be called the science of counting 


2 —Bowley A.L, 
2. Statistics may rightly be called the science of averages, 
—Bowley A L, 
3. Statistics is the science of the measurement of social orga- 
nism, regarded as a whole in all ils manifestations. 


—Bowley A.L. 
4. “Statistics is the science of estimates and probabilities.” 
—Boddington 
5. “The scienc» of Statistics is the method of judging collec- 
tive, natural or social phenomenon from the results obtained from the 
analysis or enumeration or collection of estimates,” —King 
6. Statistics is the science which deals with classification, 
and tabulation of numerical facts as the basis for explanation, des- 
cription and comparison of phenomenon." —Lovin 
7. "Statistics is the science which deals with the methods of 
collecting, classifying, presenting, comparing and interpreting numeri- 
cal data collected to throw some light on any sphere of enquiry.” 
—Selligman 
8. “Statistics may be defined as the science of collection, 
presentation, analysis and interpretation of numerical data.” 
—Croxton and Cowden 
9. “Statistics may be regarded as a body of methods for 
making wise decisions in the face of uncertainty.” 
—Wallis and Roberts 
10. “Statistics is a method of decision making in the face of 
uncertainty on the basis of numerical data and calculated. risks " 
—Prof. Ya-Lun-Chou 
11. “The science and art of handling aggregate of facts— 
observing, enumeration, recording, classifying amd otherwise syste- 
matically treating them.” -Harlow 


Some Comments and Remarks. 1, The first three defini- 
tions due to Bowley are inadequate. 


2. Boddington’s definition also fails to describe the meaning 
and functions of Statistics since it is confined’ to only probabilities 
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and estimates which form only a part of the modern statistical tools 
and do not describe the science of Statistics in all its manifestas 
tions. i 


3. King's definition is also inadequate since it confines Statis« 
tics only to social sciences. Lovitt's definition is fairly satisfactory, 
though incomplete, Selligman's definition, though very short and 
simple is quite comprehensive. However, the best of all the above 
definitions seems to be one given by Croxton and Cowden. 


4. Wallis and Roberts’ definition is quite modern since statisti 
, cal methods enable us to arrive at valid decisions, Prof. Chou's 
definition in number 10 is a modified form of this definition. 


5. Harlow's definition describes Statistics both as a science 
and an art—science, since it provides tools and laws for the analysis 
of the numerical information collected from the source of enquiry 
and art, since it undeniably has its basis upon numerical data col- 
lected with a view to maintain a particular balance and consistency 
leading to perfect or nearly perfect conclusions. A. statistician like 
an artist will fail in his job if he does not. possess the requisite skill, 
experience and patience while using statistical tools for any 
problem, 


` 1.3. Importance and Scope of Statistics. In the ancient 
times Statistics was regarded only asthe science of Statecraft and 
was used to collect information relating to crimes, military strength, 
population, wealth, etc , for devising military and fiscal policies. 
But with the concept of Welfare State taking roots almost all over 
the world, the scope of Statistics has widened to social and economic 
phenomenon, Moreover, with the developments in the statistical 
techniques during the last few decades, today, Statistics is viewed 
not only as a mere device for collecting numerical data but asa 
means of sound techniques for their Batdliag and analysis and 
drawing valid inferences from them. Accordingly, it is not merely 
a by-product of the administrative set up of the State but it 
emtbrances all sciences—social, physical, and natural, and is finding 
numerous applications in various diversified fields such as aggricul- 
ture, industry, sociology, biometry, planning, economics, business, 
mauagement, psychometry, insurance, accountancy and auditing, 
and so оп. It is rather impossible to think of any sphere of human 
activity where Statistics does not creep in. It will not be exaggera- 
tion to say that Statistics has assumed unprecedented dimensions 
these days and statistical thinking is becoming more and more indis- 
pensabl: every day for an able citizenship. In fact to a very striking 
degree, the modern culture has become a statistical culture and the 
subject of Statistics has acquired tremendous progress in the recent 
past so much so that an elementary knowledge of statistical methods 
has become a part of the general education in the curricula of 
- many countries, The importance of Statistics is amply explained in 
- the following words of Carrol D. Wright (1887), United States 
. Commissioner of the Bureau of Labour: - X 
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“То а very striking degree our culture has become a Statistical 
culture. Even a person who may never have heard of an index number 
is affected...by...of those index numbers which describe the cost of 
living. It is impossible to understand Psychology, Sociology, Hcono- 
mics, Finance or a Physical Science without some general idea of the 
meaning of an average, of variation, of concomitance, of sampling, of 
how to interpret charts and tables.” 


There isno ground for misgivings regarding the practical 
realisation of the dream of H.G. Wells viz., ““Statisticalsthinking 
will one day be as necessary for effective citizenship as the ability to 
read and write,” Statistics has become so much indispensable in all 
phases of human endeavour that it is often remarked, “Statistics is 
what statisticians do" and it appears that Bowley was right when 
he said, “A knowledge of. Statistics is like a knowledge of foreign 
language or of algebra ; it may prove of use at any time under any 
circumstances,” 

Let us now discuss briefly the importance of Statistics in some 
different disciplines. 


Statistics in Planning. Statistics is indispensable in plan- 
ning—may it be in business, economics or government level, The 
modern age is termed as the ‘age of planning’ and almost all orga- 
nisations in the government or business or Management are resorting 
to planning for efficient working and for formulating policy 
decisions. To achieve this end, the statistical data relating to pro- 
duction, consumption, prices, investment, income, expenditure and 
So on and the advanced statistical techniques such as index num- 
bers, time series analysis, demand analysis and forecasting 
techniqes for handling such data are of paramount importance. 
Today efficient planning is a must for almost all countries, parti- 
cularly the developing economies for their economic development 
and in order that planning is succesful, it must be based on a 
Correct and sound analysis of complex statistical data, For instance, 
in formulating a five-year plan, the government must have an idea 
ofthe age and sex wise break up of the population projections of 
the country for the next five years in order to develop its various 
sectors like agriculture, industry, textiles, education and so on. 
This is achieved through the powerful statistical tool of forcasting 
by making use of the population data for the previous years, Even 
for making decisions concerning the day to day policy of the 
country, an accurate statistical knowledge of the age and sex-wise 
composition of the population is imperative for the government. 
In India, the use of Statistics in planning was well visualised long 
back and the National Sample Survey (N.S.S.) was primarily set 
up іп 1950 for the coliection of statistical data for planning in 
India. 

. Statistics in State. As has already been pointed out, in the 
old days Statistics was the science of State-craft and its objective 
was to collect data relating to manpower, crimes, income anc 
wealth, etc., for formulating suitable military and fiscal policies 


| 
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With the inception of the idea of Welfare State and its taking deep 
roots іп almost all the countries, today statistical data relating to 
prices, production, consumption, income and expenditure, invest- 
ments and profits etc., and statistical tools of index numbers; time 
series analysis, demand analysis, forecasting ctc., are extensively 
used by the governments in formulating economic policies. (For 
details see Statistics in Economics). Moreover as pointed out earlier 
(Statistics in planning), statistical data and techniques are indis- 
pensable to the government for planning future economic pro- 
grammes, The study of population movement i.e population 
estimates, population projections and other allied studies together 
with birth and death statistics according to age and sex distribution 
provide any administration with fundamental tools which are 
indispensable for overall planning and evaluation of economic and 
social development programmes. The facts and figures relating to 
births, deaths and marriages are of extreme importance to various 
official agencies for a variety of administrative purposes. Mortality 
(death) statistics serve as a guide to the health authorities for 
sanitary improvements, improved medical facilities and public 
cleanliness, The data on the incidence of diseases together with the 
number of deaths by age and nature of diseases are of paramount 
importance to health authorities in taking appropriate remedial 
action to prevent or control the spread ofthe disease. The use of 
statistical data and statistical techniques is so wide in government 
functioning that today, almost all ministries and the departments 
inthe government have a separate statistical unit. In fact, today, 
in most countries the State (government) is the single unit which 
is the biggest collector and user of statistical data. In addition to 
the various statistical bureaux in all the ministries and the govern- 
ment departments in the Centre and the States, the main Statistical 
Agencies in India are Central Statistical Organisation (C.S.O.) ; 
National Sample Survey (N.S.S.), now called National Sample 
Survey Organisation (N.S.S,O.) and the Registrar General of 
India (R G,I.). 


Statistics in Economics. The interaction between Statistics 
and Economics was first observed by William Petty (by end of 17th 
century) in his book ‘Political Arithmetic’ but it took fairly long 
time for effective use of Statistics in formulation of economic 
theories and economic policies. The reason being that in old days 
Economic Theories were based on deductive logic only. Moreover, 
the statistical techniques were not that much advanced for appli- 
cations in other disciplines. It gradually dawned upon Economists 
of the Deductive School to use Statistics effectively by making 
empirical studies, as the leading economist of this school J.S. Mill 
wrote ; 


“In some cases instead of deducting our conclusions from 
reasoning and verifying them from observations we begin by obtain- 
ing them provisionally from specific experience and afterwards 
connect them with the principles of human nature by a prior 
reasoning." 
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_In 1871 W.S. Jevons, who developed the technique of analysis 
of time series and also pioneered the studies of price statistics and 
index numbers, wrote that : 


“The deductive science of economy must be verified and rendered 
useful from the purely inductive science of statistics. Theory must be 
invested with the reality of life and fact. Political economy might 
gradually be erected into the exact science if only commercial 
Statistics were far more complete and accurate than they are at 
present so that the formulae could be endowed with the exact 
meaning by the aid of numerical data." 


These views were supported by Roscher. Kines and Hilde- 
brand of the Historical School (1843—1883), Alfred Marshall, 
Pareto, Lord Keynes and so on and today there are no two opinions 
that for the development and growth of the economic science, the 
economic doctrines should not be argued in the abstract but they 
should be inductively verified. The following quotation due to 
Prof. Alfred Marshall in 1890 amply illustrates the role of Statistics 
in Economics : 


“Statistics are the straws out of which I, like every other 
economis t, have to make bricks." 


Й Statistics plays а very vital role in Economics so much so that 
in 1926, Prof. R. A, Fisher complained of “the painful misappre- 
hension that Statistics is a branch of Economics." 


Statistical data and advanced techniques of statistical analysis 
have proved immensely useful in the solution of a variety of econo- 
mic problems such as production, consumption, distribution of 
income and wealth, wages, prices, profits, savings, expenditure, 
investment, unemployment, poverty, etc. For example, the studies 
of consumption statistics reveal the pattern of the consumption of 
the various commodities by different sections of the society and 
also enable us to have some idea about their purchasing capacity 
and their standard of living. The studies of production statistics 
enable us to strike a balance between supply and demand which is 
provided by the laws of supply and demand. The income and 
wealth statistics are mainly helpful in reducing the disparities of 
income. The statistics of prices are needed to study the price theories 
and the general problem of inflation through the construction of 
the cost of living and wholesale price index numbers, The statistics 
of market prices, costs and profits of different individual concerns 
are needed for the studies of competition and monopoly. Statistics 
pertaining to some macro-variables like production, income, ex- 
penditure, savings, investments, etc., are used for the compilation 
of National Income Accounts which are indispensable for economic 
planning of a country. Exchange statistics reflect upon the 
commercial development of a nation and tell us about the money in 
circulation and the volume of business done in the country. Statistical 
techniques have also been used in determining the measures of Gross 
National Product and Input-Output Analysis. The advanced and 
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sound statictical techniques have been used successfully in the 
analysis of cost functions, production functions and consumption 
functions. 


Use of Statistics in Economics has led to the formulation of 
many economic laws some of which are mentioned below for 
iliustration : 


_ А detailed and systematic study of the family budget data 
which gives a detailed account of the family budgets showing 
expenditure on the main items of family consumption together with 
family structure and composition, family income and various other 
social, economic and demographic characteristics led to the famous 
Engel’s Law of Consumption in 1895. Vilfredo Pareto in 19th-20th 
century propounded his famous Law of Distribution of Income by 
making an empirical study of the income data of various countries 
of the world at different times. The study of the data pertaining 
to the actual observation of the behaviour of buyers in the market 
resulted in the Revealed Preference Analysis of Prof. Samuelson. 


Time Series Analysis, Index Numbers, Forecasting Techniques 
and Demand Analysis are some of the very powerful statistical tools 
which are used immensely in the analysis of economic data and 
also for economic planning. For instance, time series analysis is 
extremely used in Business and Economic Statistics for the study 
of the series relating to prices, production and consumption of 
commodities, money in circulation, bank deposits and bank clear- 
dngs, sales in a departmental store etc. 


(i) to identify the forces or components at work, the net 
effect of whose interaction is exhibited by the movement of the time 
series. : 

(ti) toisolate, study, analyse and measure them indepen- 
dently and 

The index numbers which are also termed as economic baro- 
melers' are the numbers which reflect the changes over specified 
period of time in (i) prices of different commodities, (si) industrial/ 
agricultural production, (iii) sales, (?v) imports and exports, (v) 
cost of living, etc., and are extremely useful in economic, planning. 
For instance, the cost of living index numbers are used for (4) the 
calculation of real wages and for determining the purchasing power 
of the money ; (ii) the deflation of income and value series in 
national accounts; (iii) grant of dearness allowance (D.A.) or 
bonus to the workers in order to enable them to meet the increased 
cost of living and so on. 


The demand analysis consists in making an economic study 
of the market data to determine the relation between : 


(i) the prices of a given commodity and its absorption 
capacity for the market i.e. demand and 


(ii) the price of a commodity and its output i.e., supply. 


Introduction—Meaninig & Scope 13 


Forecasting techniques based on the method of curve fitting 
by the principle of least squares and expotential smoothing are 
indispensable tools for economic planning. 


The increasing interaction of mathematics and statistics with 
economics.led to the development of a new discipline called Hcono- 
metrice—and the first Econometric Society was founded in U.S.A. in 
1930 for “the advancement of economic theory in its relation to 
mathematics and statistics..." Econometrics aimed at making 
Economics a more realistic, precise, logical and practical science. 
Econometric models based on sound statistical analysis are used for 
maxtmum exploitation of the available resources In other words, 
an attempt is made to obtain optimum results subject to a number 
of constraints on the resources at our disposal, say, of production, 
capacity, capital, technology, precision, etc., which are determined 
statistically. 


, Statistics in Business and Management. Prior to the Indus- 
trial Revolution, when the production was at the handicraft stage, 
the business activities were very much limited and were confined 
Only to small units operating in their own areas, The owner of the 
concern personally looked after all the departments of business 
activity like sales, purchase, production, marketing, finance and so 
on. But after the Industrial Revolution, the developments in. 
business activities have taken such unprecedented dimensions both in 
the size and the competition in the market that the activities of 
most of the business enterprises and firms are confined not only tó 
one particular locality, town or place but to larger are as. Some of 


the leading houses have the network of their business activities in 
almost allthe leading towns and cities of the country and even 
abroad. Accordingly it is impossible for a single person (the owner 
of the concern) to look after its activities and management has 
become a specialised job, The manager and a team of management 
executives is imperative for the efficient handling of the various 
operations like sales, purchase, production, marketing, control, 
finance, etc., of the business house. It is here that statistical data 
and the powerful statistical tools of probability, expectation, samp- 
ling techniques, tests of significance, estimation theory, forecasting 
techniques and soon play an indispensable role, According to 
Wallis and Roberts *‘Statistics may be regarded as a body of methods 
for making wise decisions in the face of uncertainty." A refinement 
over this definition is provided by Prof. Ya-Lun-Chou as follows : 
“Statistics is a method of decision making in the face of uncertainty 
on the basis of numerical data and calculated, risks," These definitions 
reflect the applications of Statistics in Business since modern business 
has its roots in the accuracy and precision of the estimates and 
Statistical forecasting regarding the future demand for the product, 
market trends and so on. Business forcasting techniques which are 
based on the compilation of useful statistical information on lead 
and lag indicators are very useful for obtaining estimates which 
serve as a guide to future economic events. Wrong expectations 
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which might be the result of faulty and inaccurate analysis of 
various factors affecting a particular phenomenon might lead to his 
disaster. The time series analysis is a very important statistical tool 
waich is used in business for the study of : 


(i) Trend (by method of curve fitting by the principle of least 
Squares) in order to obtain the estimates of the probable demand of 
the goods and, 


(ii) Seasonal and Cyclical movements in the phenomenon, 
for determining the ‘Business Cycle? which may also be termed as the 
four-phase cycle composed of prosperity (period of boom), recession, 
depression and recovery, The upswings and downswings in business 
depend on the cumulative nature of the economic forces (affecting 
the equilibrium of supply and demand) and the interaction between 
them. Most of the business and commercial series e.g. series relating 
to prices, production, consumption, profits, investments, wages, etc., 
are affected. to a great extent by business cycles. Thus the study of 
business cycles is of paramount importance іп business and a bu i- 
nessman who ignores the effects of booms and depression is bound 
to fail since his estimates and forecasts will definitely be faulty. | 


The studies of Economic Barometers (Index Numbers of Prices) | 
enable the businessman to have an idea about the purchasing power 
of money. The statistical tools of demand analysis enable the 
businessman to strike a balance between supply and demand. [For 
details, see Statistics in Economics]. | 


The technique of Statistical Quality Control, through the | 
powerful tools of Control Charts and Inspection Plans is indispens- 
able to any business organisation for ensuring that the quality of the 
manufactured product is in conformity with the consumer's speci- 
fications. (For details see Statistics in Industry.) 


Statistical tools are used widely by business enterprises for the 
promotion of new business, Before embarking upon any production 
process, the business house must have an idea about the quantum | 
of the product to be manufactured, the amount of the raw material А 
and labour needed for it, the quality of the finished product, mar- j 
keting avenues for the product, the competitive products in the | 
market апа зо on. Thus the formulation of a production plan is a 
must and this cannot be achieved without collecting the statistical 
information on the above items without resorting to the powerful 
technique of Sample Surveys. As such, most of the leading business я 
and industrial concerns have fullfledged statistical units with trained 
and efficient statisticians for formulating such plans and arriving 
at valid decisions in the face of uncertainty with calculated risks. 
These units also carry on research and development programmes for 
the improvement of the quality of the existing products (in the light 
ofthe compstitive products in the market), introduction of new 
products and optimisation of the profits with existing resources at 
their disposal. 


=F 
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Statistical tools of probability and expectation are extremely 
useful in Life Insurance which is one of the pioneer branches of 
Business and Commerce to use Statistics since the end of the seven- 
teenth century. [For details see Statistics in Insurance] 


Statistical techniques have also been used very widely by 
business organisations in : 

(i) Carrying out Time and Motion studies (which are a part of. 
the scientific management), 


(it) Marketing Decisions (based on the statistical analysis of 
consumer preference studies—demand analysis). 


(sit) Investment (based on sound study of individual shares and 
debentures). 

(iv) Personnel Administration (for the study of statistical data 
relating to wages, cost of living, incentive plans, effect of labour 
dispute/unrest on the production, performance standards etc.,). 

(v) Credit policy. 

(vi) Inventory Control (for co-ordination between production 
and sales). 

(vit) Accounting (for evaluation of the assets of the business 
concerns). [For details see Statistics in Accountancy and Auditing]. 

(viii) Sales Control (through the statistical data pertaining to 
market studies, consumer preference studies, trade channel studies 
and readership surveys etc.) and so on. 

From the above discussion it is obvious that the use of statisti- 
cal data and techniques is indispensable in almost all the branches 
of business activity. 


Statistics in Industry. In industry, statistics is extensively 
used in ‘Quality Control’. The main objective in any production 
process it to control the quality of the manufactured product io 
that it conforms to specifications. This is called process contro 
and is achieved through the powerful technique of control charts 
and inspection plans. The discovery of the control charts was made 
by a young physicist Dr. W. A. Shewhart of the Bell se 
Laboratories (U.S.A.) in 1924 and the following years and is base 
on setting the Зо (3—sigma) control limits which has its bi sis on 
the theory of probability and normal distribution. Inspection plans 
are based on special kind of sampling techniques which are a very 
important aspect of statistical theory. i 

istics in Astronomy. Even in the ancient past the 
MEN E recordings about the movements of heavenly 
bodies like stars and planets for the study of eclipses. J. Kepler pro- 
pounded his famous three laws relating to the movements of Wege 
ly bodies after making a detailed study of the statist "al data a 
lected by Tycbo Brave (1554—1601) regarding the E »vements o 
the planets. It was on the basis of Kepler’s laws i ^. Sir Isaac 
Newton later developed his famous law of gravitation [he Princi- 
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ple of Least Squares, one of the most important tools in statistical 

theory, was developed by Gauss who used it to obtain the equation / 
of the famous ‘Normal Law of Errors’ in Astronomy in the beginning 

of the 19th century (1809). Gauss used the normal curve to describé 

the theory of accidental errors of measurements involved in the 

calculation of orbits of. heavenly bodies. 


Statistics in Physical Sciences. The applications of Statis- 
tics in Astronomy, which is a physical science, have already been 
discussed above. In physical sciences, a large number of measure- 
ments are taken on the same item. There is bound to be variation 
in these measurements. In order to have an idea about the degree 
of accuracy achieved, the statistical techniques (Interval Estimation 
—confidence intervals and confidence limits) are used to assign 
certain limits within which the true value of the phenomenon may 
be expected to іе. ‘The desire for precision was first felt in physical 
sciences and this led the sicence to express the facts under study in 
quantitative form, Тһе statistical theory with the powerful tools of 
sampling, estimation (point and interval), design of experiments 
etc., is most effective for the analysis of the quantitative expression 
of all fields of study. Today, there is an increasing use of Statistics 
in most of the physical sciences such as astronomy, geology, engi- 
neering, physics and meteorology. 


Statistics in Social Sciences, According to Bowley, ‘‘Statis- 
tics is the science of the measurement of social organism, regarded 
as a whole in all its manifestation,” In the words of W.I. King, 
“The science of Statistics is the method of judging collective, natural 
or social phenomenon from the results obtained from the analysis 
or enumeration or collection of estimates.” These words of Bowley 


and King amply reflect upon the importance of Statistics in social 
sciences. 


Every social phenomenon is affected to a marked extent by a 
multiplicity of factors which bring out the variation in observations 
from time to time, place to place and object to object. Statistical 


ing to any strata of society and then analysing the results and 


“Without an adequate understanding of the statistical methods, 
the investigator in the social sciences may be like the blind man grop- 
ing in a dark room for a black cat that is not there. The methods of 
Statistics are useful in an over-widening range of human activities in 
any field of thought in which numerical data may be had.” 
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Statistics in Biology and Medical Sciences. Sir Francis 
lton (1822—1911), a British Biometrician pioneered the use of 
statistical methods with his work on ‘Regression’ in connection with 
the\inheritance of stature. According to Prof. Karl Pearson (1857— 
1936) who pioneered the study of ‘Correlation Analysis’, the whole 
theory of heredity rests on statistical basis. In his Grammar of 
Sciences he says, “Zhe whole problem of evolution is а problem of 
vital statistics, a problem of longevity, of fertility, of health, of disease 
and it ie impossible for the evolutionist to proceed without statistics ae 
it would be for the Registrar General to discuss the rational mortulity 
without an enumeration of the population, a classification of deaths and 
@ knowledge of statistical theory.” 


In medical sciences also, the statistical tools for the collection, 
Presentation and analysis of observed factual data relating to the 
Causes and incidence of diseases are of paramount importance, For 
example, the factual data relating to pulse rate, body temperature, 
blood pressure, heart beats, weight etc., of the patient greatly help 
the doctor for the proper diagnosis of the disease ; statistical papers 
are used to study heart beats through electro-cardiogram (E.C.G.). 
Perhaps the most important application of Statistics in medical 
Sciences lies in using the tests of significance (more precisely Student’s 
t-test) for testing the efficacy of a manufacturing drug, injection or 
medicine for controlling/curing specific ailments. The testing of the 
effectiveness of a medicine by the manufacturing concern is a must, 
Since only after the effectiveness of the medicine is established by the 
Sound statistical techniques that it will venture to manufacture it on 
a large scale and bring it out in the marker. Comparative studies 
for the effectiveness of different medicines by different concerns can 
also be made by statistical techniques (t and F tests of significance). 


1.4. Limitations of Statistics. Although Statistics is in- 
dispensable to almost all sciances—social, physical and natural, 
and is very widely used in almost all spheres of human activity, it 
is not without limitations which restrict its scope and utility. 


І. Statistics does not study qualitative phenomenon. ‘Statistics 
are numerical statementsin any department of enquiry placed in 
relation to each other.’ Since Statistics is a science dealing with a 
set of numerical data, it'can be applied to the study of only those 
Phenomen a which can be measured quantitatively. Thus the 
Statements like ‘population of India has increased considerably 
during the last few years’ or ‘the standard of living of the people 
in Delhi has gone up as compared with last year,’ do not constitute 
Statistics. As such Statistics cannot be used directly for the study 
of quality characteristics like health, beauty, honesty, welfare, 
Poverty etc., which cannot be measured quantitatively. However, 
the techniques of statistical analysis can be applied to qualitative 
phenomena indirectly by expressing them numerically after as- 
Signing particular scores or quantitative standards. For instance, 
attribute of intelligence in a group of individuals can be studied 
on the basis of their intelligence quotients (І.О, .'ѕ) which may be 
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regarded as the quantitative measure of the individuals’ intelligence, 

2. Statistics does not study individuals, According to Prof, 
Harace Secrist, “Ву Statistics we mean aggregate of facts affected 
toa marked extent by multiplicity of factors...and placed in rela- 
tion to each other.”” Thus a single or isolated figure cannot be 
regarded as Statistics unless it isa part of the aggregate of facts 
relating to any particular field of enquiry, Thus statistical methods 
do not give any recognition to an object or a person or an event in 
isolation. This is a serious limitation of Statistics. For instance, 
the price ofa single commodity, the profit of a particular concern 
orthe production ofa particular business house do not constitute 
statistics since these figures are unrelated and uncomparable. How- 
ever, the aggregate of figures relating to prices and consumption of 
various commodities, the sales and profits of a business house, the 
income, expenditure, production, etc., over different periods of 
time, places, etc., will be Statistics, Thus from statistical point of 
view the figure ofthe population of a particular country in some 
given year is useless unless we are also given the figures of the 
population of the country for different years or of different count- 


ries for the same yearfor comparative studies. Hence Statistics 
is confined only to those problems where group characteristics are 
to be studied. 


3. Statistical laws are not exact, Since the statistical laws 
are probabilistic in nature, inferences based on them are only 
approximate and not exact like the inferences based on mathe- 
matical or scientific (physical and natural sciences) laws. Statistical 
laws are true only on the average. If the probability of getting a 
head in a single throw of a coin is 4,it does not imply that if we 
toss a coin 10 times, we shall get five heads and five tails. In 10 
throws of a coin we may get 8 heads, 9 heads or all the 10 heads, 
or we may not get even a single head. By this we mean that if the 
experiment of throwing the coin is carried on indefinitely (very 
large number of times), then we should expect on the average 50% 
heads and 50% tails, 

4. Statistics is liable to be misused. Perhaps the most signi- 
ficant limitation of Statistics is that it must be used by experts. 
According to Bowley, “Statistics only furnishes a tool though 

“imperfect which is dangerous in the hands of those who do not 
know its use and deficiencies.” Statistical methods are the most 
dangerous tools in the hands of the inexperts. Statistics is one of 
those sciences whose adepts must exercise the self-restraint of an 
artist. Greatest limitation of Statistics is that it deals with figures 
which are innocent in themselves and do not bear on their face 
the label of their quality and can be easily distorted, manipulated 
or moulded by politicians, dishonest or unskilled workers, unscru- 
pulous people for personal selfish motives, Statistics neither prove 
nor disprove anything. It is merely a tool which, if rightly used 
may prove extremely useful but if misused by inexperienced, un- 
skilled and dishonest statisticians might lead to very fallacious 
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conclusions and even prove to be disastrous. In the words of W.I. 
King, ‘‘Statistics are like clay of which you can make a God or a 
Devil as you please.” At another place he remarks, “Science of 
Statistics is the useful servant but only of great value to those who 
understand its proper use.” 


Thus the use of Statistics by the experts who are well experien- 
ced and skilled in the analysis and interpretation of statistical data 
for drawing correct and valid inferences very much reduces the 
chances of mass popularity of this important science. 


15. Distrust of Statistics. The improper use of statistical 
tools by unscrupulous people with an improper statistical bend of 
mind has led to the public distrust in Statistics. By this we mean 
that publjc loses its belief, faith and confidence in the science of 
Statistics and starts condemning it. Such irresponsible, inexperien- 
ced and dishonest persons who use statistical data and statistical 
techniques to fulfil their selfish motives have discredited the science 
of Statistics with some very interesting comments, some of which 
are stated below : 


(i) An ounce of truth will produce tons of Statistics. 
(ii) Statistics can prove anything. 
(iit) Figures do not lie. Liars figure. 
(iv) Statistics is an unreliable science. 


(v) There are three types of lies—lies, damned lies and Statis- 
tics, wicked in the order of their naming ; and so on. 


Some of the reasons for the above remarks may be enumerated 
as follows : 


(a) Figures are innocent and believable, and the facts based on 
them are psychologically more convincing. But itis a pity that 
figures do not have the label of quality on their face. 

3 (b) Arguments are put forward to establish certain results 
1 which are not true by making use of inaccurate figures ог by using 
1 incomplete data, thus distorting the truth. 

E (c) Though accurate, the figures might be moulded and mani- 
4 


pulated by dishonest and unscrupulous persons to conceal the truth 
and present a wrong and distorted picture of the facts to the public 


ы for personal and selfish motives. 
à t Hence, if Statistics and its tools are misused the fault does not 
E lie with the science of Statistics. Rather, it is the people who misuse 


it, are to be blamed. 


Utmost care and precautions should be taken for the interpre- 
tation of statistical data in all its manifestations. "'Statistics should 
Dot be used as a blind man uses a lamp-post for support instead of 
illumination.” However, there are misapprehensions about the 
argument that Statistics can be used effectively by expert statisti- 
Cians, as is given in the following remark due to Wallis and Roberts: 
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“He who accepts statistics indiscriminately will often be duped 
unnecessarily. But he who distrusts statistics indiscriminately will 
often be ignorant unnecessarily. There is an accessible alternative 
between blind guliibility and blind distrust, It is possible to inter- 
pret statistics skilfully. The art of interpretation need not be mono- 
polized by statisticians, though, of course, technical statistical know- 
ledge helps. Many important ideas of technical statistics can be 
conveyed to the non-statistician without distortion or dilution. Sta- 
tistical interpretation depends not only on statistical ideas but also on 
ordinary clear thinking. Clear thinking is not only indispensable in 
interpreting statistics but is often sufficient even in the absence of 
Specific statistical knowledge. For the statistician not only death 
and taxes but also statistical fallacies are unavoidable. With skill, 
common sense, patience and above all objectivity, their frequency 
can be reduced and their effects minimised. But eternal vigilance is 
the price of freeduin from serious statistical blunders.” 


We give below some illustrations regarding the mis-interpreta- 
tion of statistical data. 


1. “The number of car accidents committed in a city ina 
particular year by wowen drivers is 10 while those committed by 
men drivers is 40. Hence women are safe drivers". ‘The statement 
is obviously wrong since nothing is said about the total number of 
men and women drivers in the city in the given year. Some valid 
conclusions can be drawn if we are given the proportion of the 
accidents committed by male and female drivers. 

2. “It has been found that the 25% of the surgical operations 
by a particular surgeon аге successful. If he is to operate on four 
persons on any day and three of the operations have proved un- 
Successful, the fourth must be a success,” The given conclusion is 
Not true since statistical laws are probabilistic in nature and not 
exact. The conclusion that if three operations on a particlar day 
are unsuccessful, the fourth must be a success, is not true, It may 
happen that the fourth operation is also unsuccessful. It may also 
happen that on ane day two or three or even all the four operations 
may be successful. The statement means that as the number of 
operations becomes larger and larger, we should expect, on the aver- 
age, 25% of the operations to be successful. 


3. Areport: "The number of traffic accidents is lower in 
foggy weather than on clear weather days. Hence itis safer to 
ive in fog.” 

_, The statement again is obviously wrong. To arrive at any 
valid conclusions we must take into account the difference between 
the rush of traffic under the two weather conditions and also the 
extra cautiousness observed when driving in bad weather. 


4. “80% of the people who drink alcohol die before attaining 
the age of 70 years. Hence drinking is harmful for longevity of 
life.” This statement is also fallacious since no information is given 
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about the number of persons who do not drink alcohol and die 
before attaining the age of 70 years. In the absence of the infor- 


mation about the proportion of such persons we cannot draw any 
valid conclusions. 


5. Incomplete data usually leads us to fallacious conclusions. 
Let us consider the scores of two students Ram and Shyam in three 
tests during a year. 


lst test — 2nd test — 3rdtest Average Score 

Ram's Score 50% 60% 70% 0% : 

Shyam's Score 7095 60% 50% 60 % i 
If we are given the average score which is 60% in each case, 

we will conclude that the level of intelligence of the two students at 

the end of the year is same, But this conclusion is false and miss LA 

leading since a careful study of the detailed marks over the three 

tests reveals that Ram has improved consistently while Shyam has j 

deteriorated consistently. 2 
Remark. Numerous such examples can be constructed to — — 

illustrate the misuse of statistical methods and this is all due to their 

unjudicious applications and interpretations for which the science of — — 

Statistics can not be blamed, 


EXERCISE 1.1. 


1. (a) Write a short essay on the origin and development of the science 
of Statistics, 


(b) Give the names of some of the veterans in the development of Statise — 
fics, along with their contributions. И 


2. (а) Discuss the utility of statistics to the state, the economist, the 
industrialist and the social worker, 


x 
y 
A 
(Mysore О. В, Com, April 1982) = 
(b) Define Statistics” and discuss the importance of statistics in a plan- 1 
ned economy. v: 


(Nagarjuna О. В, Com. April 1981) 
3. (a) Define the term *'Statistics" and discuss the importance of statistics 


in business. (Delhi U. B, Com, External 1982) 1 
aN 


the term “Statistics” and discuss its functions and limitations, 
Oa [Delhi U. B. Com. 1983 ; Punjab U. В. A. (Econ. Hons.) 1981) 


4. Explain critical a few of the definitions of Statistics and state the 
hich you think to be the E 
EU (Delhi U. B, Com, 1982). 


‘ ics is a method of decision-making in the face of uncertain’ 
on the 07 data and calculated risks," Explain with suitab 


illustrations. 


E \ [(AIMA (Diploma in Management) May 1978) | 


d Tu Vést Boaga: 
Ст ун: 
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6. Comment briefly on the following statements, 
(a) “Statistics is the Science of human welfare.” 


(6) “То a very striking degree our culture has become a Statistical 
culture." 
(c) Statistical thinking will one day beas necessary for efficient citizen- 
ship as the ability to read and write, 
(Punjab U. B. Com. 1 980) 


7. (a) Comment briefly on the following statements : 
(i) “Statistics can prove anything” 
(ii) “Statistics affects everybody and touches life at many points. 
It is both a science and an art”, 
(Mysore U. B. Com. Nov. 1980 ; Punjab U. B. Com Sept. 1981) 
(b) "He who accepts statistics indiscriminately will often be duped un- 


necessarily, but he who distrusts statistics indiscriminately will often be ignorant 
indiscriminately.” Comment on the above Statement, 


(Guru Nanak Dev U. B. Com. 4981) 


, “Sciences without statistics bear no fruit, statistics without sciences 
bave no root.” Explain the above statement with necessary comments, 


3 9. Comment on the following statements illustrating your view point 
with suitable examples : 

(a) “Knowledge of Statistics is like a knowledge of foreign language or of 
algebra. It may prove of use at any time under any circumstances." (Bowley) 

(b) “Statistics is what statisticians до” 

(c) “There are lies, damned lies and Statistics—wicked in the order of. 
their naming." 

(d) “By Statistics we mean aggregate of facts affected to a marked extent 
by multiplicity of causes, numerically expressed, enumerated or estimated accord- 
ing to reasonable standards of accuracy, collected ina systematic manner for a 
pre-determined purpose and placed in relation to each other’. (Horace Secrist) 

(e) Statistics are the straws out of which I, like every other economist, 
have to make the bricks," (Marshall) 


(f), “Statistics are like bikinis : they reveal what is interesting and con- 
ceal what is vital,” 
(Punjab U. B. Com. 1980) 


10. (a) What do you understand by distrust of Statistics ? Is the science 
of Statistics to be blamed for it ? 

Е (Б) Write a critical note on the limitations and distrust of Statistics. 
Discuss the important causes of distrust and show how Statistics could be made 
reliable, (Guru Nanak Dev U. B. Com Sept. 1980) 

(c) Define ‘Statistics’ and discuss its scope and limitations. 
(Kurukshetra U. B. Com. Sept. 1980) 
(d) Discuss the use of Statistics in the fields of economics, trade and 
commerce. Whatare the limitations of statistics. 
(Mysore U. B. Com. Nov. 1981) 
ll. (a) “Statistical methods are most dangerous tools in the hands of 
the inexperts." Discuss and explain the limitation of statistics. 
(Guru Nanak Dey U. B. Com. 1977) 
(b) “The science of statistics, then, is а most vseful servant but only of 
great value to those who understand its proper use"—(King). ms 
Comment on the above statement and discuss the limitations of statistics. 
[Kurukshetra U. B. Com. 1978 ; Punjab U. B. A. (Econ. Hons.) 1982] 
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(c), “Statistics are like clay of which you can make a God or Devil, as 
you please”. 
In the light of this statement, discuss the use and limitations of Statistics, 
(Kurukshetra U. B. Com. Sept. 1980) 
(d) “All statistics are numerical statements but all numerical statements 
are not statistics.” 


Comment in about 10 lines. 
(Mysore U. B. Com, Nov. 1980) 


12. Comment on the following statements : 
1 (a) *'Statistics are like clay of which you make a God or Devil, as you 
please.” 
(b) “Statistics is the science of estimates and probabilities. 
(c) “Statistics is the science of counting." 
(Punjab U. B. Com. 1981) 


13. Comment on the following statistical statements, bringing out in 
details the fallacies, if any : 

(i) “A survey revealed that the children of engineers, doctors and law- 
yers have high intelligence quotients (L Q.). It further revealed that the grand 
fathers of these children were also highly intelligent. Hence the inference is that 
intelligence is hereditary.” 

(ii) “The number of deaths in military in a recent war in a country was 
15 out of 1,000 while the number of deaths in the capital of the country during 
the same period was 22 per thousand. Hence it is safe to join military service 
than to live in the capital city of the country.” 


(iii) “The number of accidents taking place in the middle of the road is 
much less than the number of accidents taking place on its sides, Hence it is 
safer to walk in the middle of the road.” 

(iv) “The frequency of divorce for couples with the children is only about 
tof that for childless couples ; therefore producing children is an effective check 
on divorce." 

(v) “The increase in the price of a commodity was 20%. Then the price 
decreased by 15% and again inereased by 10%. So the resultant increase in the 
Price was 20—15--10— 1595". 

(vi) Nutritions Bread Company, a private manufacturing concern, 

charges a lower rate per loaf than that charged by a Government of India Under- 
taking ‘Modern Bread.’ Thus private ownership is more efficient than public 
ownership, 
4 (vii) According to the estimate of an economist, the per capita national 
income of India for 1931-32 was Rs. 65. The National Income Committee 
estimated-the corresponding figure for 1948-49 as Rs. 225. Hence in 1948-49 
Indians were nearly four times as prosperous as in 1931-32 ? 

14. Point out the ambiguity or mistakes found in the following state- 
ments which are made on the basis of the facts given : 

$ (a) 80% of the people who die of cancer are found to be smokers and 
so it may be concluded that smoking causes cancer. 

(b) The gross profit to sales ratio of a company was 15% in the year 
1974 and was 10% in 1975. Hence the stock must have been undervalued. 

(c) The average output in a factory was 2,500 units in January 1981 and 
2,400 units in February 1981. So workers were more efficient in January 1981. 


(d) The rate of increase in the number of buffalloes in India is greater 
than that of the population. Hence the people of India are now getting more 


milk per head, 
eh! [Osmania U. B. Сот. (Hons.) Nov. 1981] 
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— 15. Comment on the following : H 


: (а) 50 boys апа 50 girls took an examination. 30 boys and 40 girls got 
_ through the examination. Hence girls are more intelligent than boys, 


(5) The average monthly incomes in two cities of Hyderabad and 
Madras were found to be Rs. 330. Hence, the People of both the cities have 
the same standard of living, 


(c) А tutorial college advertised that there was 100 per cent success of 
the candidates who took the coaching ín their institute. Hence ihe college has 
got good faculty. 


_ [Osmania U. B. Com. (Hons.) April 1983] 
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24. Introduction. As ‘pointed out in Chapter 1, Statistics 
are a set of numerical data. (See definitions of Secrist, Croxton and 
Cowden etc.), In fact only numerical data constitute Statistics, 
This means that the phenomenon under study must be capable of 
quantitative measurement. Thus the raw material of Statistics 
always originates from the operation of counting (enumeration) or 
measurement. For any statistical enquiry, whether it is in business, 
economics or social- sciences, the basic problem is to collect facts 
and figures relating to particular phenomenon under study. The 
person who conducts the statistical enquiry i.e., counts or measures 
the characteristic under study for further statistical analysis is 
known as investigator. Ideally, (though a costly presumption), the 
investigator should be trained and efficient statistician, But in 
practice, this is not always or even usually so. 'The persons from 
whom the information is collected are known as respondents and the 
items on which the measurements are taken are called the. statistical 
units. [For details see $ 2.1.2]. The process of counting or enumer- 
ation or measurement together with the systematic recording of 
results is called the collection of statistical data. The entire structure 
of the statistical analysis for any enquiry is based upon systematic 
collection of data. 


On the face of it, it might appear that the collection of data 
is the first step for any statistical investigation. Butin a scientific- 
ally prepared (efficient and well planned) statistical enquiry, the 
collection of data is by no means the first step. Before we embark 
upon the collection of data for a given statistical enquiry, it is 
imperative to examine carefully the following points which may be 
termed as preliminaries to data collection : 

(i) Objectives and scope of the enquiry, 

(3i) Statistical units to be used. 

(227) Sources of information (data). 
(iv) Method of data collection. 
(v) Degree of accuracy aimed at in the fina] results. 
25 
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(vi) Type of enquiry. 
We shall discuss these points briefly in the following sections. 


: 2.11. Objectives and Scope of the Enquiry, The first and 
foremost step in organising any statistical enquiry is to define in 
clear and concrete terms the objectives of the enquiry, This is very 
essential for determining the nature of the statistics (data) to be 
collected and also the statistical techniques to be employed for the 
analysis of the data, The objectives of the enquiry would help in 
eliminating the collection of irrelevant information which is never 
used subsequently and also reflect upon the uses to which such in- 
formation can be put. In the absence of the purpose of the enquiry 
being explicitly specified, we are bound to coliect irrelevant infor- 
mation and also omit some important information which will 
ultimately lead to fallacious conclusions and wastage of resources. 


Further, the scope of the enquiry will also have a great 
bearing upon the data to be collected and also the techniques 
to be used for its collection and analysis. Scope ofthe enquiry 
relates to the coverage with respect to the type of information, 
subject matter and geographical area. For instance if we want to 
study the cost of living index numbers, it must be specified if they 
relate to a particular city or state or whole of India. Further, the 
class of people (such as a low-income group, middle income group, 
labour class etc.,) for which they are intended should also be speci- 
fied clearly. Thus, if the investigation is to be on a very large 
scale, the sample method of enumeration and collection will have 
to be used. However, if the enquiry is confined only to a small 
group, we may undertake 100% enumeration (census method). Thus 
if the scope of the enquiry is very wide, it has to be of one nature 
and if the scope of enquiry is narrow, it has to be a totally different 
nature, 


Thus the decision about the type of enquiry to be conducted 
depends largely on the objectives and scope of the enquiry. How- 
ever, the organisers of the enquiry should take care that these objec- 
tives and scope are commensurate with the available resources in 
terms of money, manpower and time limit required for the 
availability of the results of the enquiry. 


2.1.2. Statistical Units to be Used. A well defined and 
identifiable object or a group of objects with which the measure- 
ments ог counts in any statistical investigation are associated is 
called a statistical unit. For example, in a socio-economic survey 
the unit may be an individual person, a family, a household or a 
block of locality. А very important step before the collection of 
data begins is to define clearly the statistical units on which the 
data are to be collected. In a number of situations the units are 
conventionally fixed like the physical units of measurement such as 
metres, kilometres, kilograms, quintals, hours, days, week etc., 
which are well defined and do not need any elaboration or explana- 
tion. However in many statistical investigations, particularly 


| 
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relating to socio-economic studies, arbitrary units are used which 
must be clearly defined. Thisis imperative since in the absence 
of a clear cut and precise definition of the statistical units, serious 
errors in the data collection may be committed in the sense that we 
may collect irrelevant data on the items, which should have, in fact, 
been cxcluded and omit data on. certain items which should have 
been included. This will ultimately lead to fallacious conclusions. 


REQUISITES OF A STATISTICAL UNiT 


The following points might serve as guidelines for deciding 
about the unit in any statistical enquiry. 

1, Jt should be un ambiguous. A statistical unit should be 
rigidly defined so that it does not lead to any ambiguity in its 
interpretation. The units must cover the entire population and they 
should be distinct and non-overlapping in the sense ‘that every 
element of the population belongs to one and only one statistical 
unit, 

2. It should be specific, The statistical unit must be precise 
and specific leaving no chance to the investigators. Quite often, 
in most of the socio-economic surveys the various concepts/charac- 
teristics can be interpreted in different variant forms and according- 
ly the variable used to measure it may be defined in several different 
ways. For example, in ап enquiry relating to the wage level of 
workers in an industrial concern the wages might be weekly wages, 
monthly wages or might refer to those of skilled labour only or of 
day workers only or might include bonus payments also, Similarly 
Prices in an enquiry might refer to cost prices, selling prices, retail 
prices, whole-sale prices or contract prices. Thus in a statistical 
enquiry it is inportant to distinguish between the conventional and 
the arbitrary definitions of the characteristics/variables, the former 
being the one prevalent in common use and shail always remain 
same (fixed) for every enquiry while the latter is the one which is 
used in a specific sense and refers to the working or operational 
definition which will keep on changing from one enquiry to another 
enquiry. 

3. It should be stable. The unit selected should be stable over 
a long period of time and also w.r.t. places t.e., there should not be 
significant fluctuations in the value of a unit at different intervals of 
time or at different places because in the contrary case, the data 
collected at different times or places will not be comparable and 
this would mar their utility to a great extent. The fluctuations in 
the value of money at different times (due to inflation) or in the 
measurement of weights at different places (due to height above sea 
level) might tender the comparisons useless. Thus, the unit selected 
should imply, as far as possible, the same characteristics at different 
times or at different places. 


4. It should be appropriate to the enquiry. As already pointed 
out earlier, the concept and definition of arbitrary statistical units 
keep on changing from enquiry to enquiry. The unit selected must 
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_ be relevant to the given enquiry. Thus, for studying the changes 
in the general price level, the appropriate unit is the whole-sale 
prices while for constructing the cost of living indices (or consumer 
price indices) the appropriate unit is the retail Prices. 


$ : 5, It should be uniform, Tt is essential that the unit adopted 

should be homogeneous (uniform) throughout the investigation so 
that the measurements obtained are comparable. For example, in 
measuring length if we use a yard on some occasions and metre on 
other occasions in an investigation, the observations obtained would 
be confusing and misleading. 


TYPES OF STATISTICAL UNITS 
The statistical units may be broadly classified as follows : 
(1). Units of collection, 
(Qt! Units of analysis and interpretation. 


fi) Units of Collection. The units of collection may further 
be sub-divided into the following two classes : 


(a) Unit of Enumeration. In any statistical enquiry, whether it 
is conducted by ‘sample’ method or ‘census’ method, unit of 
enumeration is the basic unit on which the observations are to be 
made and this unit is to be decided in advance before conducting 
the enquiry keeping in view the objectives of the enquiry. The unit 
of enumeration may be a person, a household, a family, a farm (in 
land experiments), a shop, a live stock, a firm etc. As has been 
pointed out earlier, this unit should be very clearly defined in terms 
of shape, size etc. For instance, for the construction of cost of 
living index number, the proper unit of enumeration is household. 
Tt should be explained in clear terms whether a household consists 
ofa family comprising blood relations only or people taking food 
ina common kitchen or all the persons living in the house or the 
persons enlisted in the ration card only. The concept of the house- 
hold (to be used in the enquiry) is to be decided in advace and 
explained clearly to the enumerators so thatthere are no essential 
omissions or irrelevant inclusions. 


(6) Units of Recording. The units of recording are the units 
in terms of which the data are recorded or in other words they are 
the units of quantification, For instance, in the construction of cost 
of living index number (consumer price index) the data to be collec- 
ted from each household, among other things, include the retail 
prices of various commodities together with the quantities consumed 
by the class of people for whom the index is meant. The units of 

recording for quantity may be weight (in case of food-grains), say, 
in kilograms, quintals, tons, etc., in case of clothing the unit of 
recording may be metres ; the prices may by recorded in terms of 
rupees and so on. 


Units of measurement (recording) may be simple or composite. 
The units which represent only one condition without any quali- 
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fication (adjective) are called simple units such as metre, rupee, ton, 
kilogram, pound, bale of cloth, hour, week, year, ete. Such units 
are generally conventional and not at all difficult to define. How- 
ever, sometimes care has to be taken in their actual usage. For 
example, the bale of cloth must be defined in terms of length, say, 
20 metres, 50 metres or 100 metres. Similarly, in case of weight 
it should be clearly specified whether it is net weight or gross 
weight. 


A simple unit with some qualifying words is called a composite . 


unit. A simple unit with only one qualifying word is called a com- 
pound unit. Examples of such units are skilled worker, employed 
person, ton—kilometre, kilowatt hour, man hours, retail prices, 
monthly wages, passenger kilometres. For instance ton—kilometre 
means the number of tons multiplied by the number of kilometres 
carried ; man hours implies the total number of workers multiplied 
by the number of hours that each worker has putin and’so on. 1f 
more qualifying words are added to a simple unit, it is called 
mplex unit such as production. per machine hour, output per 
а hour and soon, "Thus аз compared to simple units, composite 
(compound and complex) units are much more restrictive in scope 
and difficult to.define. Such units should be defined properly and 
clearly as they need explanation about the unit used and also 
about the qualifying words, 


(ii) Units of Analysis and Interpretation. As the name ` 


implies, the units of analysis and interpretation are those units in 
the form.of which the statistical data are ultimately analysed and 
interpreted, It should be decided whether the results would be 
expressed in absolute figures or relative figures. The units of analysis 
and interpretation facilitate comparisons between different sets of 
data with respect to time, place or environment (conditions). 
Generally, the units of analysis are rates, ratios and percentages, 
and coefficients. 3 


Ratas involve the comparison between two hetrogeneous 
quantities Ф.е,, when the numerator and denominator are not of the 
same kind e.g., the mortality (death) rates, the fertility (birth) rates 
and so on. Rates are usually expressed per thousand. For instance, 
the Crude Birth Rate (C.B.R.) ig the ratio of total number of live 
briths in the given region or locality during a given period to the 
total population of that region or locality during the same period, 


multiplied by 1,000. Rate per unit is called coefficient. However, 


ratios and percentages are used for comparing homogeneous quan- 
tities e.g., when the numerator and denominator are of the same 
kind. For example, “the ratio of smokers to non-smokers ina 
Particular locality is 1:3" implies that 25% of the population 
are smokers, ч 


From practical point of view for comparing data relating (0. 
different series, usually the unit of analysis is one which gives rela- . 


tive figures which аге pure numbers independent of peers 
measurement. For instance, if we want to compare two series lor 


Business Statistics 


(£v) Direct or Indirect. 
(v) Regular or Ad-hoc, 
(vi) Census or Sample, 


(vii) Primary or Secondary. 


Official, Semi.official or Un-ofücial A very important 
factor in the collection of data is ‘the sponsoring agency of the 
Survey or enquiry.’ Ifan enquiry is conducted by or on behalf of 
the central, state or local governments it is termed as official 
enquiry. A semi-official enquiry is one that is conducted by organi- 
sations enjoying government patronage like the Indian Council of 
Agricultural Research (I.C.A R ), New Delhi ; Indian Agricultural 
Statistics Research Institute (.A.S.R.I), New Delhi ; Indian Sta- 
tistical Institute (1,5.1.), Calcutta ; and New Delhi and so on, An 
un-official enquiry is one which is sponsored by private institutions 
like the E,LC.G.L, trade unions, universities or the individuals. 
Obviously the facilities available for each type of the above en- 
enquiries differ considerably. In an official enquiry, legal or 
Statutory compulsions can be exercised asking the public or res- 
pondents to furnish the requisite information in time and that too 


ments can afford to spend much more on an enquiry as compared 
with private institutions, which in turn, can generally spend more 
than an individual. Consequently, there is bound to be difference 
in the types of enquiries depending upon the sponsoring agency and 
also its financial implications. 


i) Initial or Repetitive. As the name suggests, an initial 
or original enquiry is one which is conducted for the first time 
while a repetitive enquiry is one which is carried on in continua- 
ton or repetition of som» previously conducted enquiry (enquiries). 
Tn conducting an original enquiry the entire scheme of the plan 
starting with definitions of variousterms,the units, the method of 
collection ete., has to be formulated afresh whereas in repetitive 
enquiry there is no such problem as such a plan already exists and 
only the original enquiry is to be modified to suit the current 
situation and on the basis of the experience gained in the past 
enquiry. However, for making valid conclusions in a repetitive 
enquiry, itshould be ascertained that there is not any material 
change in the definitions of Various terms used in the original 
enquiry. Ss i 
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(їй) Confidential or Non-confidential. In a confidential 
enquiry, the information collected and the results obtained are kept 
confidential and they are not made known to the public. The find- 
ings of such enquiries are meant only for the personal records of 
the sponsoring agency. The enquiries conducted by private organi- 
Sations like trade unions, manufacturers’ associations, Private 
business concerns are usually of confidential nature. On the other 
hand, the types of enquiries whose results are published and made 
known to the general public are termed as non-confidential enqui- 
ries. Most of the enquiries conducted by the state, private bodies 
or even individuals are of this type. 


, (iv) Director Indirect. An enquiry is termed as direct 
ifthe phenomenon under study is capable of quantitative measure- 
ment such as age, weight, income, prices, quantities consumed and 
so on. However, if the phenomenon under study is of a qualitative 
nature which is not capable of quantitative measurement like hones- 
ty, beauty, intelligence etc., the corresponding enquiry is termed as 
indirect one. In such an enquiry, the quantitive characteristic is 
converted into quantitative phenomenon by assigning appropriate 
standard which may represent the given attribute (qualitative 
phenomenon) indirectly. For example, the study of the attribute 
of intelligence may be made through the Intelligence Quotient 
(I.Q.) score of a group of individuals in a given test. f 


(v) Regular or Ad-hoc. Ifthe enquiry is conducted periodi- 
cally at equal intervals of time (monthly, quarterly, yearly etc.) it 
is said to be regular enquiry. For example, the census is conducted 
in India periodically every 10 years. Similarly a number of enqui- 
ties are conducted by the Central Statistical Organisation (C.S.O.) 
and their results are published periodically such as Monthly Abst- 
ract of Statistics, Monthly Statistics of Production of Selected 
Industries of India, Statistical Abstract, India (Annually) ; Statisti- 
cal Pocket Book, India (Annually). On the other hand if an enquiry 
is conducted as and when necessary without any regularity or 
periodicity, it is termed as ad-hoc. For instance C.S.O. and N.S.S.O. 
(National Sample Survey Organisation) conduct a number of 
ad-hoc enquiries. 

(vi) and (vii). Enquiries of type (vi) viz., Census or Samplet 
and (vii) Primary or Secondary have been discussed later in this 
chapter. 


In any statistical enquiry, after deciding about the factors or 
problems enumerated above from § 2.1.1 to § 2.1.6, we are now all 
set for the process of actual collection of data relating to the given 
enquiry. In the following sections, we will discuss the methods of 
data collection. 

22. Primary and Secondary Data. After going through 
the preliminaries discussed in the above section, we come to the 
Problem of data collection. The most important factor in any 


1 For delails see $ 2°1°4 and Chapter 15. 
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(iv) Direct or Indirect. 
(v) Regular or Ad-hoc, 
(vi) Census or Sample, 


(vii) Primary or Secondary, 


Official, Semi-official or Un-offcial, A very important 
factor in the collection of data. is ‘the sponsoring agency of the 
Survey or enquiry.’ Ifan enquiry is conducted by or on behalf of 
the central, state or local governments it is termed as official 
enquiry. A semi-official enquiry is one tliat is conducted by organi- 
sations enjoying government patronage like the Indian Council of. 
Agricultural Research (1.C,A.R ), New Delhi ; Indian Agricultural 
Statistics Research Institute (Т1.А.5.К.Ї.), New Delhi ; Indian Sta- 
tistical Institute (LS.L), Calcutta ; and New Delhi and so on, An 
un-official enquiry is one which is sponsored by private institutions 
like the F.L.C.O.L, trade unions, universities or the individuals, 
Obviously the facilities available for each type of the „above en- 
enquiries differ considerably. In an oficial enquiry, legal or 
Statutory compulsions can be exercised asking the public or res- 
pondents to furnish the requisite information in time and that too 
at their own cost, In semi-official type enquiries also, the necessary 
information may b» obtained without much difficulty. However, 
in un-official enquiries the investigator is faced with serious prob» 
lems in getting information from the respondents. He can only 
pursuade and request them for information. In such enquiries there 
isonly moral obligation and no legal compulsion on the respon- 
dents, Things are still worse if the enquiry is conducted by an 
individual who, at stages, has even to beg for information. More- 
over, there are lot of differences in the financial positions of these 
three sponsoring agencies, Obviously, the state or central govern- 
ments can afford to spend much more on an enquiry as compared 
with private institutions, which in turn, can generally spend more 
than an individual. Consequently, there is bound to be difference 
in the types of enquiries depending upon the sponsoring agency and 
also its financial implications. 


н (ii) Initial or Repetitive. As the name suggests, an initial 
or original enquiry isone which is conducted for the first time 
while a repetitive enquiry is one which is carried on in continua- 
tion or repetition of som» previously conducted enquiry (enquiries). 
In conducting an original enquiry the entire scheme of the plan 
starting with definitions of various terms, the units, the method of 
collection etc., hasto be formulated afresh whereas in repetitive 
enquiry there is no such problem as such a plan already exists and 
only the original enquiry is to be modified to suit the current 
situation and on the basis of the experience gained in the past 
enquiry. However, for making valid conclusions ina repetitive 
enquiry, it should be ascertained that there is not any material 
change in the definitions of various terms used in the original 
enquiry. / > 
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(iii) Confidential or Non-confidential. In a confidential 
enquiry, the information collected and the results obtained are kept 
confidential and they are not made known to the public. The find- 
ings of such enquiries are meant only for the personal records of 
the sponsoring agency. The enquiries conducted by private organi- 
sations like trade unions, manufacturers’ associations, private 
business concerns are usually of confidential nature. On the other 
hand, the types of enquiries whose results are published and made 
known to the general public are termed as non-confidential enqui- 
Ties, Most of the enquiries conducted by the state, private bodies 
or even individuals are of this type. ; 


А (tv) Direct or Indirect. An enquiry is termed as direct 
if the phenomenon under study is capable of quantitative measure- 
ment such as age, weight, income, prices, quantities consumed and 
so on. However, if the phenomenon under study is of a qualitative 
nature which is not capable of quantitative measurement like honés- 
ty, beauty, intelligence etc., the corresponding enquiry is termed as 
indirect one. In such an enquiry, the quantitive characteristic is 
converted into quantitative phenomenon by assigning appropriate 
standard which may represent the given attribute (qualitative 
phenomenon) indirectly. For example, the study of the attribute 
of intelligence may be made through the Intelligence Quotient 
(I.Q.) score of a group of individuals in a given test. Í 


(v) Regular or Ad-hoc. Ifthe enquiry is conducted periodi- 
cally at equal intervals of time (monthly, quarterly, yearly etc.) it 
is said to be regular enquiry. For example, the census is conducted 
in India periodically every 10 years. Similarly a number of enqui- 
ties are conducted by the Central Statistical Organisation (C.S.O.) 
and their results are published periodically such as Monthly Abst- 
ract of Statistics, Monthly Statistics of Production of Selected 
Industries of India, Statistical Abstract, India (Annually) ; Statisti- 
cal Pocket Book, India (Annually). On the other hand if an enquiry 
is conducted as and when necessary without any regularity or 
periodicity, it is termed as ad-hoc. For instance C.S.O. and N.S.S.O. 
(National Sample Survey Organisation) conduct a number of 
ad-hoc enquiries. j 

(vi) and (vii). Enquiries of type (vi) viz., Census or Samplet 
and (vi) Primary or Secondary have been discussed later in this 
chapter. 


In any statistical enquiry, after deciding about the factors or 
problems enumerated above from $ 2.1.1 to $ 2.1.6; we are now all 
set for the process of actual collection of data relating to the given 
enquiry. In the following sections, we will discuss the methods of 
data collection. 

22. Primary and Secondary Data. After going thrcugh 
the preliminaries discussed in the above section, we come to the 
problem of data collection. The most important factor in any 


Т For delails see § 271-4 and Chapter 15. 
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statistical enquiry is that the original collection of data is correct 
and proper. If there are inadequacies, shortcomings or pitfalls at 
the very source of the data, no useful and valid conclusions can be 
drawn even after applying the best and sophisticated techniques of 
data analysis and presentation of the results. In this context, it may 
be interesting to quote the remarks made by a judge on Indian 
statistics. “Сох, when you are bit older you will not quote Indian 
Statistics with that assurance. The governments are very keen on 
amassing statistics—they collect them, add them, raise them to the 
nth power, take the cube root and prepare wonderful diagram, 
But what you must never forget is that every one of those figures 
comes in the first instance from the ‘Chowkidar’ (i.e., the village 
watchman) who just puts down what he damn pleases.’’* 


It may be remarked that this quotation applies to India of 
very old days when no definite statistical set up existed in India. 
Today in India, we have a fairly sound and systematic. method of 
data collection on almost all problems relating to various diversi- 
fied fields such as economics, business, industry, demography, social, 
physical and natural sciences. As already pointed out the data may 
be obtained from the following two sources : 


(i) Тһе investigator or the organising agency may conduct 
the enquiry originally or 


(i) He may obtain the necessary data for his enquiry from 
some otber sources (or agencies) who had already collected the data 
. on that subject. 


Such data becomes the Secondary source to any one who later uses 
these data. In other words secondary source is the agency who 
publishes or releases for use by others the data which was not origi- 
nally collected and processed by it. 


Tt may be observed that the distinction between primary and 
secondary data is a matter of degree or relativity only. The same 
set of daia may be secondary in the hands of one and primary in 
the hands of others. In general, the data are primary to the source, 
who collects and processes them for the first time and are secondary 
for all other sources who later use such data. For instance, the 
data relating to mortality (death rates) and fertility (birth rates) in 
India published by the Office of Registrar General of India, New 


*The earliest use of this Story seems to have been made in Sir, Josiah 
Stamp, ‘Some Economic Factors in Modern Life’, P.S. King and Son, Londor 
1929, р. 258-259. 
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Delhi are primary whereas the same reproduced by the United 
Nations Organisation (U.N.O.) in its U.N. Statistical Abstract be- 
come secondary in as faras later agency (U.N.O.) is concerned. 
For this data, the office of Registrar General of India, is the primary 
source while U.N.O. is the secondary source. Likewise, the data 
collected by C,S.O. and N.S.S.O. for various surveys are primary as 
far as these departments are concerned but they become secondary 
if such data are used by other depart nents or organisations, 

221. Choice Between Primary and Secondary Data. 
Obviously, there is lot of difference in the method of collection of 
primary and secondary data. In the case of primary data which is 
to be collected originally, the entire scheme of the plan starting with 
the definitions of various terms used, units to be employed, type of 
enquiry to be conducted, extent of accuracy aimed at etc., is to be 
formulated whereas the collection of secondary data is in the form 
of mere compilation of the existing data. A proper choice between 
the type of data (primary or secondary) needed for any particular 
Statistical investigation is to be made after taking into consideration 
the nature, objective and scope of the enquiry ; the time and finances 
(money) at the disposal of the agency ; the degree of precision aimed 
atand ihe status of the agency (whether government—state or 
central—or private institution or an individual), К 


Remarks 1. In using the secondary data it is best to 
obtain the data from the primary source as far as possible. By doing 
50, we would at least save ourselves from the errors of transcription 
(if any) which might have inadvertently crept in the secondary 
source. Moreover, the primary source will also provide us with 
detailed discussion about the terminology used, statistical units 
employed, size of the sample and the technique of sampling (if the 
sample method was used), methods of data collection and analysis 
of results and we can ascertain ourselves if these suit our purpose, 


2. It may be pointed out that today, in a large number of 
Statistical enquiries secondary data are generally used because fairly 
teliable published data on a large number of diverse fields are now 
available in publications of the governments (state or centre), private 
Organisations and research institutions, international agencies, 
periodicals and magazinesetc. In fact primary data are collected 
only if there do not exist any secondary data suited to the investi- 
gation under study. In some of the investigations both primary as 
well as secondary data may be used, 


3. Internal and External Data. Some statisticians diffe- 
rentiate between primary and secondary data in the form of internal 
or external data. Internal data of an organization (of business or 
€conomic concern or firm) are those which are collected by the 
organisation from its own internal operations like production, sales, 
Profits, loans, imports and exports, capital employed etc., and used 
by it for its own purposes. On the other hand external data are 
those which are obtained from the publications of some other agen- 


тани 
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cies like governments (central or state), international bodies, private 
research institutions etc., for use by the given organisation. 


2.3. Methods of Collecting Primary Data. The methods 
mor used for the collection of primary data are enumerated 
(i) Direct personal investigation. 
(ii) Indirect oral interviews, 
(iit) Information received through local agencies. 
(iv) Mailed questionnaire method. 
(v) Schedules sent through enumerators. 


2.3.1. Direct Personal Investigation. This method con- 
sists in the collection of data personally by the investigator (organis- 
ing agency) from the sources concerned. In other words, the investi- 
gator has to go to the field personally for making enquiries and 
soliciting information from the informants or respondents, This 
nature of investigation very much restricts the scope of the enquiry. 
Obviously this technique is suited only if the enquiry is intensive 
rather than extensive. In other words, this method should be used 
only if the investigation is generally local—confined to a single 
locality, region or area. Since such investigations require the per- 
sonal attention of the investigator, they are not suitable for extensive 
studies where the scope of investigation is very wide. Obviously, the 
information gathered from such investigation is original in nature. 


Merits. (i) The first hand information obtained by the in- 
vestigator himself is bound to be more reliable and accurate since 
the investigator can extract the correct information by removing the 
doubts, if any, in the minds of the respondents regarding certain 
questions. In case, the investigator suspects foul play on the part 
of respondent(s) in supplying wrong information on certain items 
he can check it by some intelligent cross-questioning. 


г (ii) The data obtained from suchinvestigation is generally 
reliable if the type of enquiry is intensive in nature and if time and 
money do not pose any problems for the investigator. 


. . (st) When the audience is approached personally by the inves- 
tigator, the response is likely to be more encouraging. 


(iv) Different persons have their own ideas, likes and dislikes 
ard their opinions on some of the questions may be coloured by 
their own prejudices and vision and as such some of them might 
react very sharply to certain sensitive questions posed to them. The 
investigator, being on the spot, can handle such a delicate situation 
creditably and effectively by his skill, intelligence and insight either 
by changing the topic or if need be, by explaining to the respondent 
in polite words the objectives of the survey in detail. 

n from the 


(v) The investigator can extract proper information frc 
respondents by talking to them at their educational level and if need. 
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be ask them questions in their language of communication and using 
local connotations, if any, for the words used. 


Demerits. (i) As already pointed out this type of investi- 
gation is restrictive in nature and is suited only for intensive studies 
and not for extensive enquiries. This method is thus not. suitable if 
the field of investigation is too wide in terms of the number of per- 
sons to be interviewed or the area to be covered. 


(ii) "This type of investigation is handicapped due to lack of 
time, money and manpower (labour). It is particularly time con- 
suming since the informants can be approached only at their con- 
venience and in case of working class, this restricts the contact of the 
investigator with the informants only in the evenings or at the week 
ends and consequently the investigation is to be spanned over a long 
period, ‹ 


(ФИ) The greatest drawback of this enquiry is that it is abso- 
lutely subjective in nature. The success of the investigation largely 
depends upon the intelligence, skill, tact, insight, diplomacy and 
courage of the investigator, If the investigator lacks these qualities 
and is not properly trained the results of the enquiry cannot be taken 
as satisfactory or reliable. Moreover the personal biases, prejudices 
and whims of the investigator may, in cartain cases, adversely affect 
the findings of the enquiry. 


(iv) Further, if the investigator is not intelligent, tactful or 
skilful enough to understand the psychologies and customs of the 
interviewing audience, the results obtained from such an investiga- 
tion will not be reliable. 


2.3.2. Indirect Oral Investigation. When the ‘direct per- 
sonal investigation’ is not practicable either because of the unwilling- 
ness or reluctance of the persons to furnish the requisite information 
or due to the extensive nature of the enquiry or due to the fact that 
direct sources of information do not exist or are unreliable, an in- 
direct oral investigation is carried out. For example, if we want 
to solicit information on certain social evils like if a person is addic- 
ted to drinking, gambling or smoking etc., the person will be 
reluctant to furnish correct information or he may give wrong in- 
formation. The information on the gambling, drinking or smoking 
habits of an individual can best be obtained by interviewing his per- 
sonal friends, relatives or neighbours who know him thoroughly well. 
In these types of enquiries factual data on different problems are 
collected by interviewing persons who are directly or indirectly con- 
cerned with the subject matter of the enquiry and who are in posses- 
sion of the requisite information. The method consists in collection 
of the data through enumerators appointed for this purpose, A 
small list of questions pertaining to the subject matter of the enquiry 
is prepared. These questions are then put to the persons, known 
as witnesses or informants, who are in possession of such information 
and their replies аге recorded. Such a procedure for the collection 
of factual data on different problems is usually adopted by the 
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Enquiry Committees or Commissions appointed by the government 
—State or Сега]. 


Merits. (i) Since the enumerators contact the informants 
personally, as discussed in the first method, they can exercise their 
intelligence, skill, tact etc., to extract correct and relevant informa- 
tion by cross examination of the informants, if necessary. 


(ii) As compared with the method of "direct personal investi- 
gation”, this method is less expensive and requires less time for 
conducting the enquiry. 


(i$) If necessary, the expert views and suggestions of the 
specialists on the given problem can be obtained in order to formu- 
late and conduct the enquiry more effectively and efficiently. 


Demerits. (i) Due to lack of direct supervision and personal 
touch the investigator (sponsoring agency) has to rely entirely on the 
information supplied by the enumerators, The success of the method 
lies in the intelligence, skill, insight and efficiency of the enumera- 
tors and also on the fact that they are honest pérsons with high inte- 
grity and without any selfish motives. It should be ascertained that 
the enumerators are Properly trained and tactful enough to elicit 
proper and correct response from the informants. Moreover, it 
should be seen that the personal biases due to the prejudices, and 
via of the enumerators do not enter or at least they are minimi- 
sed, 


(ii) Тое accuracy of the data collected and the inferences 
rawn, depend to a large extent on the nature and quality of the 
witnesses from whom the information is obtained. A wrong and 
improper choice of the witnesses will give biased results which 
may adversely affect the findings of the enquiry. It is, therefore, 
imperative ; 


(a) To ascertain the reliability and integrity of the persons 
(witnesses) selected for interrogation. In other words, it should be 
ascertained that the witnesses are unbiased persons without any sel- 

h motives and that they are not prejudiced in favour of or 
against a particular view point. 


: (0) That the findings of the enquiry are not based on the 
information supplied by a single person alone. Rather, a sufficient 
number of persons should be interviewed to find out the real 
position. 


(c) That the witnesses really possess the knowledge about the 
problem under study i.e., they are aware of the full facts of the 
problem under investigation and arein a position to give a clear, 
detailed and correct account of the problem. 


(d) That a proper allowance about the pessimism or ms 
of the witnesses depending upon their inherent psychology shoul 
e. : 
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2.3.3. Information Received Through Local Agencies, In 
this method the information is not collected formally by the investi- 
gator or the enumurators. This method consists in the appointment 
of local agents (commonly called correspondents) by the investigator 
in different parts of field of enquiry, These correspondents or agen- 
cies in different regions collect the information according to their 
Own ways, fashions, likings and decisions and then submit their 
reports periodically to the central or head office where the data are 
processed for final analysis. This technique of data collection is 
usually employed by newspaper or periodical agencies who require 
information in different fields like sports, riots, strikes, accidents, 
economic trend, business stock and share market, politics and so on. 
This method is also used by the various departments of the govern- 
ment (state or central) where the information is desired periodically 
(at regular intervals of time) from a wide area. This method is 
particularly useful in obtaining the estimates of agricultural crops 
which may be submitted to the government by the village school 
teachers. A more refined and sophisticated way of the use of this 
technique is the registration method in which any event, say, birth, 
death, incidence of disease etc., is to be reported . to the appropriate 
authority appointed by the government like Sarpanch or Patwari in 
the village; or Block Development Officers (B.D.O.'s), civil hospitals 
or the health departments in the district headquarters etc, as and 
when or immediately after it occurs. Vital statistics i.e., the data 
relating to mortality (deaths) and fertility (births) are usually collec: 
ted in India through the registration technique. 


Merits. This method works out to be very cheap and econo- 
mical for extensive investigations particulary if the data are obtained 
through part-time correspondents or agents. Moreover, the required 
information can be obtained expeditiously since only rough estimates 
are required. ў 


Demerits. Since the different correspondents collect the 
information in their own fashion and style, the results are bound to 
be biased due to the personal prejudices and whims of the corres- 
pondents in different fields of the enquiry and consequently the data 
so obtained will not be very reliable. Hence, this technique of data 
collection is suited if the purpose of investigation is to obtain rough 
and approximate estimates only and where a high degree of accu- 
racy is not desired. 


In particular, the registration method suffers from the draw- 
back that many persons do not report and thus neglect to register. 
This usually results in under-estimation. For an effective and efficient 
system of registration there should be legal compulsions for registra- 
tion of events and also there should be sanctions for the enforce- 
ment of the obligation. " 

2.3.4. Mailed Questionnaire Method. This method consists 
in preparing a questionnaire (a list of questions relating to the field 
of enquiry and providing space for the answers to be filled by the 
respondents) which is mailed to the respondents with a request for 
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quick response within the specified time. A very polite covering 
note, explaining in detail the aims and objectives of collecting the 
information and also the operational definitions of various terms 
and concepts used in the questionnaire is attached. Respondents are 
also requested to extend their full co-operation by furnishing the 
correct replies and returning the questionnaire duly filled in time. 
Respondents are also taken into confidence by ensuring them that 
the information supplied by them in the questionnaire will be kept 
strictly confidential and secret. In order to ensure quick and better 
response the return postage expenses are usually borne by the in- 
vestigator by sending a self-addressed stamped envelope. This 
method is usually used by the research workers, private individuals, 
non-official agencies and sometimes even by government (central 
or state). 


In this method, the questionnaire is the only media of com- 
munication between the investigator and the respondents, Conse- 
quently, the most important factor for the success of the ‘mailed 
questionnaire method, is the skill, efficiency, care and the wisdom 
with which the questionnaire is framed. The questions asked should 
be clear, brief, corroborative, non-offending, courteous in tone, un- 
ambiguous and to the point so that not much scope of guessing is 
left on the part of the respondent. Moreover, while framing the 
questions the knowledge, understanding and the general educational 
level of the respondents should be taken into consideration. 


Remark. “Drafting or framing the questionnaire”, is of 
paramount importance and is discussed in detail in $2.4 after 
“Schedules sent through enumerators’. 


Merits. (i) Of all the methods of collecting information, the 
‘mailed questionnaire method? is by far the most economical method 
in terms of time, money and manpower (labour) provided the 
respondents supply the information in time, 


(i) This method is used for extensive enquiries covering a 
very wide area. 


A 

(iii) Errors due to the personal biases of the "investigators or 

enumerators are completely eliminated аз the information is 

supplied directly by the Person concerned in his own handwriting. 
The information so obtained is original and much more authentic. 


Demerits. (i The most serious drawback of this method 
is that it can be used effectively with advantage only if the audience 
(people) is (are) educated (literate) and can understand the questions 
welland reply them in their own handwriting. Obviously, this 
method is not practicable if the People are, illiterate. Even if they 
are educated, there may be a number of persons who are not inters- 
ted in the particular enquiry being conducted and as such they 
adopt an attitude of indifference towards the enquiry which results 
in their questionnaires finding place in the waste paper baskets. In 
the case of those who return the questionnaires after filling, a 
number of them supply haphazard, vague, incomplete and unintel- 
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ligible information which does not serve much purpose, Thus, this 
method generally suffers from the high degree (i.e., very large pro- 
portion) of non-response and consequently the results based on the 
information supplied by a very small proportion of the selected 
individuals cannot be regarded as reliable. 


(ii) Quite often people might suppress correct information 
and furnish wrong replies. We cannot verify the accuracy and reli- 
ability of the information received. In general, this method also 
suffers from the low degree of reliability of the information supplied. 
by the respondents. 


(iii) Another limitation of this method is that at times, infor- 
mants are not willing to give written information in their own hand- 
writing on certain personal questions like income, property, personal 
habits and so on. 


(iv) Sincethe questionnaires are filled by the respondents 
personally, there is no scope for asking supplementary questions for 
cross checking of the information supplied by them. Moreover, 
the doubts in the minds of the informants, if any, on certain ques- 
tions can not be dispelled. 


2.3.5. Schedules Sent Through Enumerators. Before 
discussing this method it is desirable to make a distinction between 
а questionnaire and a schedule, As already explained, questionnaire 
in a list of questions which are answered by the respondent himself 
in his own handwriting while schedule is the device of obtaining 
answers to the questions in a form which is filled by the interviewers 
or enumerators (the field agents who put these questions) in a face 
to face situation with the respondents. The most widely used method 
of collection of primary data is the ‘schedules sent through the 
enumerators’. This is so because this method is free from certain 
shortcomings inherent in the earlier methods discussed so far. In 
this method the enumerators go to the respondents personally with 
the schedule (list of questions), ask them the questions there in and 
record their replies. This method is generally used by big business 
houses, large public enterprises and research institutions like 
National Council of Applied Economic Research (NCAER), Fede- 
tation of Indian Chambers of Commerce and Industries (FICCI) 
and so on and even by the governments-state or central—for certain 
projects and investigations where high degree of response is desired. 
Population census, all over the world is conducted by this tech- 


nique, 


Merits. (i) The enumerators can explain in detail the objec- - 
tives and aims of the enquiry to the informants and impress upon 
them the need and utility of furnishing the correct information. 
Being on the spot, the enumerators can dispel the doubts, if any, of 
certain people to certain questions by explaining to them the 
implications of certain definitions and concepts used in the question- 
naire. 
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(ii) This technique is very useful іп extensive enquiries and 
generally yields fairly dependable and reliable results due to the 
fact that the information is recorded by highly trained and educated 
enumerators. Moreover, since the enumerators personally call on 
the respondents to obtain information there is very little non- 
response which occurs if it is not possible to contact the respondents 
even after repeated calls or if the respondent is unwilling to furnish 
the requisite information. Thus, this method removes both the draw- 
backs of the ‘mailed questionnaire method’, viz., very large propor- 
tion of non-response and fairly low degree of reliability of the 
information, 


(iii) Unlike the ‘mailed questionnaire method’, this technique 
can be used with advantage even if the respondents are illiterate. 


(iv) As already pointed out in the ‘direct personal investi- 
gation', due to personal likes and dislikes, different people react 
differently to. different questions and as such some people might 
react very sharply to certain sensitive and personal questions. In 
that case the enumerators, by their tact, skill, wisdom and calibre 
can handle the situation very effectively by changing the topic of 
discussion, if need be. Moreover, the enumerators can effectively 
check the accuracy of the information supplied by some intelligent 
cross-questioning by asking some supplementary questions. 


Demerits. (i) Itis fairly expensive method since the team of 
enumerators is to be paid for their services and as such can be used 
by only those bodies or institutions which are financially sound. 


(ii) Itisalso more time consuming as compared with the 
*mailed questionnaire method'. 


(iii) The success ofthe method largely depends upon the 
efficiency and skill of the enumerators who collect the information. 
Thus the choice of enumerators is of paramount importance. The 
enumerators have to be trained properly in the art of collecting 
correct information by their intelligence, insight, patience and. per- 
severence, diplomacy and courage. They should clearly understand 
the aims and objectives of the enquiry and also the implications of 
the various terms, definitions and concepts used in the question- 
naire. They should be provided with adequate guidelines so that 
their personal biases do not enter the final results of the enquiry. 
They should be honest persons with high integrity and should not 
have any personal axe to grind. They should be well versed in the 
local language, customs and traditions. If the enumerators are 
biased they may suppress or even twist the information supplied by 
the respondents. Inefficiency on the part of the enumerators coup’ 
with personal biases due to their prejudices and whims will lead to 
false conclusions and may even adversely affect the results of the 
enquiry. 

(iv) Due to inherent variation in the individu 
of the enumerators there is bound to be variation, 


al personalities 
though not so 
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obvious, іп the information recorded by different enumerators. An 
attempt should be made to minimise this variation. 


(v): The success of this method also lies to a great extent on 
the efficiency and wisdom with which the schedule is prepared or 
drafted. Ifthe schedule is framed haphazardly and incompetently, 
the enumerators will find it very difficult to get the complete and 
correct desired information from the respondents. 


Remarks. 1 In the last two methods viz., ‘mailed question- 
naire method’ and the ‘schedules sent through enumerators’, it is 
desirable to scrutinise the questionnaires or schedules duly filled in 
for detecting any apparent inconsistency in the information supplied 
by the respondents or recorded by the enumerators. 


‚ 2. If resources (time, money and manpower) permit, two sets 
of enumerators may be used for recording information for the 
enquiry under investigation and their findings may be compared. 
This will, incidentally, provide a check on the honesty and integrity 
of the enumerators and will also reflect upon personal bias due to 
the prejudices and whims of the individual personalities (of the 
enumerators). However, this technique is not practicable in the case 
of interviewing individuals, who might get irritated, annoyed. or 
confused when approached for the second time. 


2.4, Drafting or Framing the Questionnaire. As has been 
pointed out earlier, the questionnaire is the only media of communi- 
cation between the investigator and the respondents and as such the 
questionnaire should be designed or drafted with utmost care 
and caution so that all the relevant and essential information for 
the enquiry may be collected without any difficulty, ambiguity and 
vagueness. Drafting of a good questionnaire is a highly specialised 
job and requires great care, skill, wisdom, efficiency and experience. 
No hard and fast rules can be laid down for designing or framing a 
questionnaire. However, in this connection, the following general 
points may be borne in mind : 

1. The size of the questionnaire should be as small as possible. 
The number of questions should be restricted to the minimum, 
keeping in view the nature, objectives and scope of the enquiry. In 
other words, the questionnaire should be concise and should contain 
only those questions which would furnish all the necessary informa- 
tion relevant for the purpose. Respondents' time should not be 
wasted by asking irrelevant and unimportant questions. A large 
number of questions would involve more work for the investigator 
and thus result in delay on his part in collecting and submitting the 
information. These may, in addition, also unnecessarily annoy or 
tire the respondents, A reasonable questionnaire should contain 
from 15 to 20-25 questions. Ifa still larger number of questions is 
a must in any enquiry, then the questionnaire should be divided 
into various sections or parts. 


2. The questions should be clear, brief, unambiguous, non- 
offending, courteous in tone, corroborative in nature and to the 
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point so that not much scope of guessing is left on the part of the 
respondents. 


3. The questions should be arranged in a natural logical 
sequence. For example, to find if a person owns a refrigerator the 
logical order of questions would be : “Ро you own a refrigerator" ? 
When did you buy іє? What is its make ? How much did it cost 
you? Is its performance satisfactory? Have you ever got it 
serviced ? The logical arrangement of questions in addition to 
facilitating tabulation work, would leave no chance for omissions 
or duplication. 


4. The usage of vague and ‘multiple meaning’ words should be 
avoided. The vague works like good, bad, efficient, sufficient, pros- 
perity, rarely, frequently, reasonable, poor, rich, etc., should not be 
used since these may be interpreted differently by different persons 
and as such might give unreliable and misleading information. 
Similarly the use of words with multiple meanings like price, assets, 
capital, income, household, democracy, socialism etc., should not 
be used unless a clarification to these terms is given in the question- 
naire. 


5. Questions should be so designed that they are readily com- 
prehensible and easy to answer for the respondents. They should not 
be tedious nor should they tax the respondents’ memory. Further, 
questions involving mathematical calculations like percentages, 


ratios etc., should not be asked. 


6. Questions of a sensitive and personal nature should be 
avoided. Questions like ‘How much money you owe to private 
arties ? or ‘Do you clean your utensils yourself ?" which might 
hurt the sentiments, pride or prestige of an individual should not be 
asked, as far as possible. It is also advisable to avoid questions on 
. which the respondent may be reluctant or unwilling to furnis 
information. For example, the questions pertaining to income, 
savings, habits, addiction to social evils, age (particularly, in case of 
ladies) etc., should be asked very tactfully. 


7. Types of Questions. Under this head, the questions in the 
questionnaire may be broadly classified as follows : 


(a) Shut Questions. In such questions possible answers аге 
suggeted by the framers of the questionnaire and the respondent 13 
required to tick one of them, Shut questions can further be sub- 
divided into the following forms : 

(i) Simple Alternate Questions. In such questions, the 
respondent has to choose between two clear cut alternatives like 
“Yes or No’; ‘Right or Wrong’; ‘Either, Ог and so on. For 
instance, Do you own a refrigerator 2—Yes or No. Such questions 
are also called dichotomous questions. This technique сап be applied 
with elegance to situations where two clear cut alternatives exist. 


(ii) Multiple Choice Questions. Quite often, it is not 
possible to define a clear cut alternative and accordingly in such a 
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situation either the first method (Alternate Questions) is not used or 
additional answers between Yes and Nolike Donot know, No 
opinion, Occasionally, Casually, Seldom etc., are added, For 
instance to find if a person smokes or drinks, the following multiple 
choice answers may be used : 


Do you smoke ? 
Yes (Regularly) Oo No (Never) o 
Occasionally oO Seldom Qo 


4 Similarly to get information regarding the mode of cooking 
ina household, the following multiple choice answers may be 
suggested. 


Which of the following modes of cooking you use ? 


Gas Li 
Power El 
Stove (Kerosene) о 
Coal (Coke) Li 
Wood о 


As another illustration, to find what conveyance an individual 
uses to go from his house to the place of his duty, the following 
question with multiple answers may be framed : 


How do you go to your place of duty ? 
By bus 

By your own cycle 

By your own scooter/Motor cycle 

By your own car 

By three wheeler scooter 

By taxi 

On foot 

Any, other 


Multiple choice questions are very easy and convenient for 
the respondents to answer, Such questions save time and also 
facilitate tabulation. This method should be used if only a selected 
few alternative answers exist to a particular question. Sometimes, 
a last alternative under the category ‘Others’ or ‘Any other’? may be 
added. However, multiple answer questions cannot be used with 
advantage if itis possible to construct a fairly large number of 
alternative answers of relatively equal importance to a given 
question. : 


(b) Open Questions. Open questions are those in which no 
alternative answers are suggested and the respondents ire at liberty 
to express their frank and independent opinions on th blem in 
their own words. For instance, ‘what are the draw ck; in out 
examination system’ ? ; ‘what solution do you suggest to the housing 

а S By Я pert 
problem in Delhi’ ? ; ‘which programme іп the Delhi TV do you 
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like best ? ; are some of the open questions. Since the views of the 
respondents in the open questions might differ widely, it is very 
difficult to tabulate the diverse opinions and responses, 


Remark. Sometimes a combination of both shut questions 
and open questions might be used. For instance an open question : 
‘When did you buy the car’ ?, may be followed by a multiple choice 

uestion as to whether its performance is (i) extremely good, 
(ii) satisfactory (йїї) poor (iv) needs improvement. 


8. Leading questions should be avoided. For example, the 
question ‘Why do you use a particular brand of blades, say, Erasmic 
blades’ should preferably be framed into two questions. 


(i) Which blade do you use ? 
(ii) Why do you prefer it ? 
Gives a smooth shave 

Gives more shaves 

Price is less (Cheaper) 

Readily available in the market 
Any other 


9. Cross Checks. The questionnaire should be so designed 
as to provide internal checks on the accuracy of the information 
supplied by the respondents by including some connected questions 
at least with respect to matters which are fundamental to the 
enquiry. For example in a social survey for finding the age of the 
mother the question ‘What is your age’ ?, can be supplemented by 
additional questions ‘What is your date of birth’ or ‘What is the 
age of your eldest child ?’ Similarly, the question, ‘Age at marriage" 
can be supplemented by the question ‘The age of the first child. 


10. Pre-testing the Questionnaire. From partical point of view 
it is desirable to try out the questionnaire on a small scale (i.e., on 
a small cross-section of the population for which the enquiry is in- 
tended) before using it for the given enquiry on a large scale. This 
testing on a small scale (called pre-test) has been found to be ex- 
tremely useful in practice. The given questionnaire can be improved 
or modified in the light of the drawbacks, shortcomings and prob- 
lems faced by the investigator in the pre-test. Pre-testing also helps 
to decide upon the effective methods of asking questions for solici- 
ting the requisite information. 


11. A Covering Letter. A covering letter from the organisers 
of the enquiry should be enclosed along with the questionnaire for 
the following purposes : ; 

(i) It should clearly explain in brief the objectives and scope of 
the survey to evoke the interest of thetespondents and impress upon 
them to render their full co-operation by returning the schedule/ 
questionnaire duly filled in within the specified period. 


(i) Itsbould contain a note regarding the operational defini- 
tions to the various terms and the concepts used in the questionnaire; 
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units of measurements to be used and the degree of accuracy aimed 
at. 


(iit) It should take the respondents in confidence and ensure 
them that the information furnished by them will be kept completely 
secret and they will not be harassed in any way later. 


(iv) In the case of mailed questionnaire method a self-address- 
ed stamped envelope should be enclosed for enabling the respondents 
to return the questionnaire after completing it. 


(v) To ensure quick and better response the respondents may 
be offered awards/incentives in the form of free gifts, coupons etc. 


j (vi) A copy of the survey report may be promised to the 
interested respondents. 


12. Mode of tabulation and analysis viz., hand operated, 
machine tabulation or computerisation should also be kept in mind 
while designing the questionnaire. 


13. Lastly, the questionnaire should be made attractive by 
proper layout and appealing get up. 
We give below two specimen questionnaires for illustration. 
MODEL 1 


Questionnaire for Collecting Information (Covering 
Production, Employment etc.) Relating to an 
Industrial concern 
1, Name of the concern...... 
2. (a) Name of the Proprietor/Managing Director...... 
(b) Qualifications : 
(i) Academic...... 
(ii) Technical/Professional...... 
3. (a) Location (b) Telephone Number 
(i) Factory... (i) Factory... 
(ii) Office... (it) Office... 
4. Factory Registration No. (with date)... 
5. "Total Capital employed/Assets (approximately)... 
6. Number of Shifts...... 
7 


. Whether the machinery used is indigenous. 
* Yes Li Noo] Other O 


8. What is the approximate value of the imported machi- 


nery ?:..... 
9. Whether the raw material used is available i» domestic 
market. Yes No 


10. From which country is the raw material impcetied 2... 
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11. What is the approximate annual consumption of the raw 
material ? 


12. Expenditure in Foreign currency : 


(i) Foreign Travel ...... 
(ii) Technical know-how .....- 
(iii) Material & Goods ...... 


13. Employment (Pay-roll) 


No. loyed | Salaries paid 
Categories 
1982 1983 | 1982 1983 


Management 
Supervisory/Technical 
Personnel 

Skilled workers 
Unskilled workers 
Non-technical office 
staff 


14. Production : 


Items Installed Actual Production 
(Production) Capacity 
1982 1983 


15. Market 1982 1983 
(a) Gross value ofsales — ..... A 27 


(0) Who are : 
(i) Immediate purchasers... 
(ii) End users... 


(c) The extent of market is 
Local O National O International O 
(d) If the extent of the market is international 
(i) What are the approximate foreign exchange earn- 
ings (annually) ? 
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16. 


(ii) Which countries are the chief importers of the 
product ? 
(e) Total Sales : 


National market : ......... 
International market : ........- 

(f) Are the present conditions of the market satisfactory/ 
not satisfactory /poor ?...... 


Financial Highlights 


Sales 

Profits 

Dividends 

Capital expenditure 
Fixed assets 

Share holders’ funds 


MODEL П 


We give below the 1971 Census-Individual Slip which was used 
for a general purpose survey to collect = 
(i) Social and cultural data like nationality, religion, literacy 


mother tongue etc., 


(ii) Exhaustive economic data like occupation, industry, class 


of worker and activity, if not working ; 


(iii) Demographic data like relation to the head of the house, 


ют Rene 


sex, age, marital status, birth place, births and deaths and 
the fertility of women to assess in particular the perfor- 
mance of the family planning programme. 


1971 CENSUS—INDIVIDUAL SLIP 


Sex... 


Marital status...--+ 
For currently married women only : 


(a) Age at marriage---.-- 
(b) Any child born in the last one year.....- 


т, 


8, 
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Birth place : 

(îi) Place of birth...... (ii) Rural or urban...... 
(iti) District...... (iv) State/country...... 

Last Residence. 


(i) Place of last residence... (ii) Rural/urban ...... 
(iii) District...... (iv) State/country...... 


Duration of present residence... 
Religion... 
Scheduled Caste-or Tribe... 
Literacy... 
Educational level... 
Mecther tongue... 
Other languages, if any... 
Main activity 
(а) Broad category : 
(i) Worker (C, AL, HHI, OW)* 
(ii) Non-worker (H, ST, R, D.B.I.O.)** 
(b) Place of work (Name of Village/Town)... 
(с) Name of establishment... 
(d) Name of Industry, Trade, Profession, or Service... 
(e) Description of work... 
(f) Class of worker... 
Secondary work : 
(a) Broad category (C, AL, HHI, OW). 
(b) Place of work... 
(c) Name of establishment... 
(d) Nature of Industry, Trade, Profession or Service... 
(e) Description of work... 
(f) Class of worker... 


5. Sources of Secondary Data, The chief sources of 


secondary data may be broadly classificd into the following two 
groups : 

^. (i) Published sources. 

(ii) Unpublished sources. 


HAL : House Hold Industries 


*C : Cultivator **H : Household Duties 
AL : Agriculture Labour ST : Student 
: Retired person or Renteer 


R 
OW : Other Works DBIO : Dependent, Beggar, 


Institutions, Others 
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2.5.1. Published Sources. There аге a number of national 
(government, semi-government and private) organisations and also 
international agencies which collect statistical data relating to 
business, trade, labour, prices, consumption, production, industries, 
agriculture, income, currency and exchange, health, population and 
a number of socio-economic phenomena and publish their findings 
in statistical reports on a regular basis (monthly, quarterly, 
annually, ad-hoc). These publications of the various organisations 
serve as a very powerful source of secondary data. We give below 
a brief summary of these sources. 


1, Official Publications of Central Government. The following : 
are various governmert organisations along with the year of their 
establishment [given in bracket () ] which collect, compile and 
publish statistical data on a number of topics of current interest— 
prices, wages, population, production and consumption, labour, 
trade, army etc. 


(1) Office of the Registrar General and Census Commissioner 
of India, New Delhi (1949)* 


(2) Directorate-General of Commercial Intelligence and Sta- 
tistics —Ministry of Commerce (1895). 


(3) Labour Bureau—Ministry of labour (1946). 


(4) Directorate of Economics and Statistics—Ministry of 
Agriculture and Irrigation (1948). 


(5) The Indian Army Statistical Organinisation (I.A.S.O.) 
—Ministry of Defence (1947). 


(6) National Sample Survey Organisation (N .S.S.0.), Depart- 
ment of Statistics, Ministry of Planning (1950)**, 


(7) Central Statistical Organisation (C.S.O.)—Department of 
Statistics, Ministry of planning (1951). 


Some of the main publications of the above government 
agencies are : 


(a) Monthly Abstract of Statistics ; Monthly Statistics of 
Production of Selected Industries in India ; Statistical Pocket Book, 
India; Annual Survey of Industries—General Review ; Sample 
Surveys of Current Interest in India (all published annually) ; Sta- 
tistical Systems of India ; National Income Statistics— Estimates of 
Savings in India (1960-61 to 1965-66) ; National Income Statistics— 
Estimates of Capital Formation in India (1960.61 to 1965-66) 
(Ad-hoc publications); all published by the Central Statistical 
Organisation (C.S.O.), New Delhi, 


*In India census has been carried out every ten years sinee 1881. Prio- 
to the establishment of this organisation, the census was conducted by a tempo- 
rary cell in the Ministry of Home Affairs. 

- **National Sample Survey (N.S.S.) was set up in 1950 in Ministry of 
Finance. In 1957, N.S.S. was transferred to the Cabinet Secretariate and named 
National Sample Survey Organisation (N.S.S.O.) 


52 Business Statistics 


(6) Census data in various census reports ; Vital Statistics of 
India (Annual), Indian Population Bulletin (Bi-ennial)—all pub- 
lished by Registrar-General of India (R.G.I.) 


(c) Various statistical reports on phenomenon relating to socio 
economic and demographic conditions, prices, area and yield of 
different crops, asa result of the various surveys conducted in 
different rounds by National Sample Survey Organisation 
(N.S.S.O). 

In addition to the above organisations a number of depart- 
ments in the State and Central Governments like Income Tax 
Department, Directorate Ceneral of Supplies and Disposals, Rail- 
ways, Post and Telegraphs, Central Board of Revenues, Textile 
Commissioner's Office Central Excise Commissioner's Office, Iron 
and Steel Controller’s Office and so on, publish statistical reports on 
current problems and the information supplied by them is, in 
general, more authentic and reliable than that obtained from other 
sources on the same subject. 

2. Publications of Semi-Government Statistical Organisations. 
Very useful information is provided by the publications of the 
semi-government statistical organisations some of which are 
enumerated below : 


(i) Statistics department of the Reserve Bank of India 
(Bombay), which brings out an Annual Report of the Bank, 
Currency and Finance ; Reserve Bank of India Bulletin (monthly) 
and various monthly and quarterly reports. 


(ii) Economic department of Reserve Bank of India. 

(iit) The Institute of Economic Growth Delhi. 

(iv) Gokhale Institute of Politics and Economics, Poona. 
(v) The Institute of Foreign Trade, New Delhi. 


Moreover, the statistical material published by the institutions 
like Municipal and District Boards, Corporations! Block and Pan- 
chayat Samitis on Vital Statistics (births and deaths), health. 
sanitation and other related subjects provides a fairly reliable and 
useful information. 


3. Publications of Research Institutions. Individual research 
scholars, the different departments in the various universities of 
India and various research organisations and institutes like Indian 
Statistical Institute (LS.I.) Calcutta and Delhi ; Indian Council of 
Agricultural Research (I.C.A.R.), New Delhi ; Indian Agricultural 
Statistics Research Institute (LA.S.R.L), New Delhi; National 
Council of Educational Research and Trainining (N .C.E.R.T.), 
New Delhi ; National Council of Applied Economic Research, New 
Delhi ; The Institute of Applied Man Power Research, New Delhi ; 
The Institute of Labour Research, Bombay ; Indian Standards. 
Institute, New Delhi ; and so on publish the findings of their re- 
search programmes in the form of research papers, or monographs 
or journals which are a constant source of secondary data on the 
subjects concerned. 


/——————— ————M—— 
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4. Publications of Commercial and Financial Institutions. A 
number of private commercial and trade institutions like Federation 
of Indian Chamber of Commerce and Industries (FICCI), Institute 
of Chartered Accountants of India, Trade Unions, Stock Ex- 
changes, Bank Bodies, Co-operative Societies. etc., publish reports 
and statistical material on current economic, business and other 
phenomena. 

5. Reports of Various Committees and Commissions appointed 
by the Government. The report of the survey and enquiry commis- 
sions and committees of the Central and State Governments to find 
their expert views on some important matters relating to economic 
and social phenomena like wages, dearness allowance, prices, 
national income, taxation, Jand, education etc. are invaluable 
source of secondary information. For instance Simon-Kuznet 
Committee report on National Income in India, Wanchoo Com- 
mission report on Taxation, Kothari Commission report on Edu- 
cational Reforms, Pay Commissions Reports, Land Reforms 
Committtee report, Gupta Commission report on Maruti Affairs, 
etc., are invaluable source of secondary data. 


6. Newspapers and Periodicals. Statistical material on a 
number of important current socio-economic problems can be ob- 
tained from the numerical data collected and published by some 
reputed . magazines, periodicals and newspapers like Eastern Econo- 
mist, Economic Times, The Financial Express, Indian Journal of 
Economics, Commerce, Capital, Transport, Statesman’s Year Book 
and the Times of India Year Book etc. 


7. International Publications. The publications of a number 
of foreign governments or international agencies provide invaluable 
statistical information en a variety of important economic and 
current topics. The publications of the United Nations Organisation 
(U.N.O.) like U.N.O. Statistical Year Bock, U.N. Statistical Abs- 
tract, Demographic Year Book, etc ; and its subsidiaries like World 
Health Organisation (W.H.O.) on contagious diseases ; annual 
reports of International Labour Organisation \I.L.0.) ; Inter- 
national Monetary Fund (I.M.F.) ; World Bank ; Economic and 
Social Commission for Asia and Pacific, (ESCAP) ; International 
Finance Corporation (I.F.C.) ; International Statistical Education 
Institute and so on are very valued publications of secondary data. 


Remark. It may be pointed out that the various publications 
enumerated above vary аз regards the periodicity of their publica- 
tions. Some are published periodically at regular intervals of time 
(such as weekly, monthly, quarterly or annually) whereas others 
are ad-hoc publications which do not have eny specific periodicity 
of publication. 

2.5.2. Un-Published Sources. The statistical data need not 
always be published. There are various sources of unpublished 
statistical material such as the records maintained by private firms 
or business enterprises who may not like to release their data to 
any outside agency ; the various departments and offices of the 
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Central and State Governments ; the researches carried out by the 
individual research scholars ia the universities or research institutes. 


Remark. In some of the socio-economic surveys the infor- 
mation is gathered from the respondents with the promise that it 
is exclusively meant for research programmes and will be kept 
strictly confidential, Such data are not published. In case it is 
published, it is done with a brief note namely, ''Source : Confi- 
dential.” 


2.6. Precautions in the use of Secondary Data. Secondary 
data should be used with extra caution. Before using such data, the 
investigator must be satisfied regarding the reliability, accuracy, 
adequacy and suitability of the data to the given problem under 
investigation, 

Proper care should be taken to edit it so that it is free from 
inconsistencies, errors and omissions. In the words of L.R. Connor 
“Statistics, especially other peoples’ statistics, are full of pitfalls for 
the user” and therefore, secondary data should not be used before 
subjecting it to a thorough and careful scrutiny. Prof. A.L. Bowley 
also remarks, "It is never safe to take the published statistics at 
their face value without knowing their meaning and limitations and 
it is always necessary to criticise the arguments that can be based 
upon them.” In using secondary data we should take a special note 
of the following factors. 


1. The Reliability of Data. In order to know about the reli- 
ability of the data, we should satisfy ourselves about : 


(i) the reliability, integrity and experience of the collecting 
organisation. 

(4i) the reliability of source of information and 
а (iii) the methods used for the collection and analysis cf the 

ata. 

It should be ascertained that the collecting agency was uns 
biased in the sense that it had no personal motives and right from 
the collection and compilation of the data to the presentation of 
results in the final form in the selected source, the data was thos 
roughly scrutinised and edited so as to make it free from errors аз 
far as possible. Moreover, it should also be verified that the data 
relates to normal times free from periods of economic boom or de: 
pression or natural calamities like famines, floods, earthquakes, 
wars, etc., and is still relevant for the purpose in hand. 

If the data were collected on the basis of a sample we should 
satisfy ourselves that : 


(i) The sample was adequate (not too small). 

(it) It was representative of the characteristics of the popu? 
lation, i.e., it was selected by proper sampling technique. 

(209) The data were collected by trained, experienced and 
unbiased investigators under the proper supervisory checks on the 
field work so that sampling errors were minimised. 
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(iv) Proper estimation techniques were used for estimating the 
parameters of the population. 
(v) The desired degree of accuracy was achieved by the 
compiler. 


Remark. A source note, giving in details the sources from 
which data were obtained is imperative for the validity of the secon- 
dary data since to the Jearned users of statistics the reputations of 
the sources may vary greatly from one agency to another. 


2. The Suitability of Data. Even if the data are reliable in 
the sense as discussed above it should not be used without confirm- 
ing that it is suitable for the purpose of enquiry under investigation. 
For this, it is important : 


(i) To observe and compare the objectives, nature and scope 
of the given enquiry with the original investigation. ^ 


(i4) To confirm tbat the various terms and units were clearly 
defined and uniform throughout the earlier investigation and these 
definitions are suitable for the present enquiry also. For instance, 
a unit like household, wages, prices, farm, etc., may be defined in 
many different ways. Ifthe units are defined differently in the ori- 
ginal investigation than what we want, the secondary data will be 
termed as uusuitable for the present enquiry. For example, if we- 
want to construct the cost of living indices, it must be ensured that 
the original data relating to prices was obtained from retails shops, 
co-operative stores or super bazars and not from the wholesale 
market. 


(iii) To take into account the difference in the timings of 
collection and homogeneity of conditions for the original enquiry 
and the investigation in hand. 


3. Adequacy of Data, Even if the secondary data are reliable 
and suitable in terms of the discussion above, it may not be ade- 
quate enough for the purpose of the given enquiry. This happens 
when the coverage given in the original enquiry was too narrow or 
too wide than what is desired in the current enquiry or in other 
words when the original data refers to an area or a period which is 
much larger or smaller than the required one. For instance, if the 
original data relate to the consumption pattern of the various com- 
modities by the people of a particular State, say, Maharashtra then 
it will be inadequate if we want to study the consumption pattern 
of the people for the whole country. Similarly if the original data 
relateto yearly figures of a particular phenomenon, it will be 
inadequate if we are interested in the monthly study. This is so 
because of the fluctuations in the phenomenon in different regions 
or periods. 


Another important factor to decide about the adequacy of the 
available data for the given investigation is the time period 
for which the data are available. For example, if we are given the 
values of a particular phenomenon (say, profits of a business concern 
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or production of a particular commodity) for the last 3-4 years, it 
wiil be inadequate for studying the trend pattern for which the 
values for the last 8-10 years will be required, 


Hence, in order to arrive at conclusions free from limitations 
and inaccuracies, the published data i.e. the secondary data must 
be subjected to thorough scrutiny and editing before it is accepted 
for use. 


EXERCISE 2.1 


1. (а) What do you mean by a statistical enquiry ? Describe the main 
stages in a statistical enquiry. = 
(Nagarjuna U. B. Com. April 1981] 


(b) Describe the process of planning a statistical inquiry, with special 
reference to its scope and purpose, choice between sample and census approa- 
Ches, accuracy and analysis of data. 


2. Ifyou are appointed to conduct a statistical enquiry, describe in 
general, what steps will you be taking from the stage of appointment till the 
presentation of your report. S 

[С.А. (Intermediate) May 19771 


3. (a) Distinguish between (i) primary and secondary data (її) sampling 


and census. 5 
[Delhi U. B. A. (Econ. Hons.) 19821 


. (b) Differentiate between Primary and Secondary data and discuss the 
various methods of collecting Primary data. ч 
[Punjab О. B. A. (Econ. Hons. 1980} 


‚ 4. (а) What are the various methods of collecting statistical data ? 
Which of these is most reliable and why ? 
[Kurukshetra U. B. Com, 1980 ; Punjabi U. M. A. (Econ.) 1979] 


(b) Describe the methods generally employed in the collection of statis- 
tical data, stating briefly their merits and demerits. 
[Punjab U. M. A. (Econ.) October 1981] 


5. (a) Distinguish between Primary and Secondary data. Give a brief 
account of the chief methods of collecting Primary Data and bring out their 
merits and defects. 


[Delhi U. B. Com. (Hons.) 1979, 1977} 


(b) Discuss the various methods of collecting ‘primary data’. State the 
methods you would employ to collect information about utilisation of plant 
capacity in small-scale sector in the Union Territory of Delhi. 

[Delhi U. В. Com. (External) 1976] 


.6. Distinguish between ‘Primary Data’ and ‘Secondary Data’. State 
the chief sources of Secondary Data. What precautions are to be observed when 
such data are to be used for any investigation ? 

(Delhi U. B. Com. 1977) 


7. A firm's own records are internal data. What is meant by external 
data, a primary data, a primary source and secondary source? Which is 
preferred, primary sources or secondary sources, and why ? Why do you suppose 
secondary sources are so often used ? ^ 

(Punjab U. B. Com. Sept. 1978) 


8. (a) What are consumer primary and secondary data ? State those fac- 
tors which should be kept in mind while using secondary data for the investi- 
gation. 

(Guru Nanak Dev U. B. Com. 1980) 
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(b) Distinguish between primary and secondary data. What precautions 
should be taken in the use of secondary data ? 

(Allahabad U. B. Com. 1982 ; Lucknow U. B. Com. 1982; 

Nagarjuna U. В. Сот. 1981) 


..9. (а) “In collection of statistical data common sense is the chief 
requisite and experience the chief teacher”. Discuss the above statement with 
comments. 

(6) “It is never safe to take published statistics at their face value with- 
out knowing their meanings and limitations and it is always necessary to criticise 
the arguments that can be based on them.” (Bowley). Elucidate. 


* 10. (a) Define a statistical unit and explain what should be the essential ' 
(Osmania U. B. Com. April 1978) i 
(b) What are the essential points to be remembered in the choice of - 
[C.A. (Intermediate) May 1974 (O.S.)] . 


requirements of a good statistical unit. 


statistical units. 


11. What is a statistical unit ? What do you mean by units of collection 
and units of analysis ? Discuss their relative uses. 


12. What are the essentials of a questionnaire ? Draft a questionnaire 
not exceeding ten questions to study the views on educational programmes of 


television and indicate an outline of the design of the survey. 
(Bombay U. B. Com. 1977) 


13. What do you mean by a questionnaire ? What is the difference bet- 
ween a questionnaire and a schedule ? State the essential points to be remem- 


bered in drafting a questionnaire. 
(Guru Nanak Dev U. B. Com. Sept. 1981) 


14. Discuss the essentials of a good questionnaire. *'It is proposed to 
conduct a sampie survey to obtain Information on thestudy habits of Univer- 
sity students in Chandigarh and the facilities available to them". Explain how 
you will plan the survey. Draft a suitable questionnaire for this purpose. 

(Punjab U. B. Com. 1981) 

15. You аге the Sales Promotion Officer of Delta Cosmetics Co. Ltd. 

Your company is about to market a new product. Design a suitable question- 
naire to conduct a consumer survey before the product is launched. State the 
various types of persons that may be approached for replying to the question- 


naire. 
[Delhi U. В. Com. 1974] 


16. Itis required to collect information on the economic conditions of 
textile mill workers in Bombay. Suggest a suitable method for collection of 
primary data. Draft a suitable questionnaire of about ten questions for collec- 
ting this information. Also suggest how you will proceed to carry out statistical 


analysis of the information collected. 
U.C.W.A. (Intermediate) June 1976] 


17. What аге the essentials of a good questionnaire? Draft a suitable 
questionnaire to enable you to study the effects of super makets on prices of 


essential consumer goods. 
(Madras U. B. Com. April 1978) 


18. What are the chief features of a good questionnaire ? What pre- 
i i i nnaire. 
cautions do you take while drafting a questio: (Nan arjuna О.В, Com. Oct. 1981) 


nise an enquiry into the cost of living of the 


ld Orga: о 
HA ue схо Draw up a blank form to obtain the requir 


ише community in Guntur ? 
i tion. 
Жк (Nagarjuna U. В. Com. Oct. 1980) 


20. Fill in the blanks : 


(iii) ...... isa suitable method of collecting data in cases where the 
informants are literate and spread over a vast area. 


(Madras U. B. Com. April 1977) 


Ans, (i) Four ; (ii) Primary and Secondary ; (iii) Mailed questionnaire 
method. 


21. What methods would you employ in collection of data considering 
accuracy, time and cost involved when the field of enquiry is : 
(i) small (ii) fairly large ^ (iii) very large. 
(Punjab U. B.Com. April 1977) 
di dar (i) Direct personal interview ; (ij) and (iii) Mailed questionnaire 
method. 


22. Assume that you employ the following data while conducting a 
statistical investigation : 


(i) Estimate of personal income taken from R.B.I. bulletin. 


(ii) Financial data of Indian companies taken from the annual reports 
of the Ministry of Law and Company Affairs, 


(ii) Tabulation from schedules used in interviews that you yourself 
conducted. 


(iv) Data collected by the National Sample Survey. 
Which of the above is Primary Data ? 
^ NU (Bangalore U. M. Com. 1977) 


Ans. Only (iii). 


23, Explain the necessity of editing primary and secondary data and 
briefly discuss points to be considered while editing such data. 
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Classification and Tabulation 


3.1. Introduction. In the last chapter we described the 
various methods of collecting data for any enquiry. Unfortunately 
the data collected in any statistical investigation, known as raw 
data, are so voluminous and huge that they are unwieldy and 
uncomprehensible. So, having collected and edited the data, the 
next important step is to organise it d.e., іо present it in a readily 
comprehensible condensed form which will highlight the important 
characteristics of the data, facilitate comparisons and render it 
situtable for further processing (statistical analysis) and interpreta- 
tions. 


. The presentation of the data is broadly classified into the 
following two categories : 


(i) Tabular Presentation 
(ii) Diagrammatic or Graphic Presentation. 


A statistical table is an orderly and logical arrangement of 
data into rows and columns and it attempts to present the volumi- 
nous and heterogeneous data in a condensed and homogeneous . 
form. But before tabulating the data, generally, systematic arrange- 
ment of the raw data into different homogeneous classes is necessary 
to sort out the relevant and significant features (details) from the 
irrelevant and insignificant ones, 


This process of arranging the data into groups or classes 


` according to resemblances and similarities is technically called 


classification, Thus classification ofthe data is preliminary to its 
tabulation. It is thus the first step in tabulation because the items 
with similarities must be brought together before the data are 
presented in the form of a ‘table’, _ 


On the other hand, the diagrams and graphs are pictorial 
devices for presenting the statistical data. However, in this chapter, 
we shall discuss only ‘Classification and Tabulation’ of the data 
while ‘Diagrammatic and Graphic Presentation’ of the data will be 
discussed in next chapter (Chapter 4). 
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3.2. Classification. It is of interest to give below the 
following definitions of Classification : 

“Classification is the process of arranging data into sequences 
and groups according to their common characteristics, or separating 
them into different but related parts.”—Secrist. 

** A classification is a scheme for breaking a category into a set of 
parts, called classes, according to some precisely defined differing 
characteristics possessed by all the elements of the category.” 

— Tuttle A.M. 

Thus classification impresses upon the ‘arrangement of the 
data into different classes which are to be determined depending 
upon the nature, objectives and scope of the enquiry. For instance, 
the number of students registered in Delhi University during the 
academic year 1983-84 may be classified оп the basis of any of the 
following criterion : 

(i) Sex 

(it) Age 

(iit) The state to which they belong 
(iv) Religion 
(v) Different faculties, like Arts, Science, Humanities, Law, 
Commerce, etc. 
(vi) Heights or weights 
(vi) Institutions (Colleges) and so on. 

Thus the same set of data can be classified into different 
groups ог classes in a number of ways based on any recognisable 
physical, social or mental characteristic which exhibits variation 
among the different elements of the given data. The facts in one 
class will differ from those of another class w.r.t. some charateristic 
called the basis or criterion of classification. 


As an illustration, the data relating to socio-economic enquiry, 
€.g., the family budget data relating to nature, quality and quantity 
of the commodities consumed by the group of people together with 
expenditure on different items of consumption may be classified 
under the following heads : 

(i) Food 

(ii) Clothing 
(iii) Fuel and Lighting 
(iv) House rent 
(v) Miscellaneous (including items like education, recreation, 
medical expenses, gifts, newspaper, dhobi, etc.) 
E Each ofthe above groups or classes may further be divided 
‘into sub-groups or sub-classes. For example, ‘Food’ may be sub- 


divided into Cereals (rice, wheat, maize, pulses etc) ; Vegetables ; 
Milk and milk products ; Oil and ghee ; Fruits and Miscellaneous, 


TE 
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Thus it may be understood that to analyse any statistical data 
classification may not be limited to one criterion or basis only. We 
might classify the given data w.r.t. two or more criteria or bases 
simultaneously. This technique of dividing the given data into 
different classes w.r.t more than one basis simultaneously is called 
cross-classification and this process of further classification may be. 
carried on as long as there are possible bases for classification. For 
instance, the students in the university may be simultaneously classi- ` 
fied w.r.t. sex and faculty or w.r.t. age; sex and religion (three 
criteria) simultaneously and so on. 


3.21. Functions of Classification. The functions of classi- 
fication may be briefly summarised as follows : 


(i) It condenses the data. Classification presents the huge 
unwieldy raw data in a condensed form which is readily compre-. 
hensible to the mind and attempts to highlight the significant 
features contained in the data. 


(ii) It facilitates comparisons, Classification enables us to 
make meaningful comparisons depending on the basis or criterion 
of classification. For instance the classification of the students in the 
university according to sex enables us to make а comparative study 
of the prevalence of university education among males and famales. 


(iii) It helps to study the relationships. The classification of 
the given data w.r.t. two or more criteria, say, the sex of the 
students and the faculty they join in the university will enable us to 
study the relationship between these two criteria. 


(iv) It facilitates the statistical treatment of the data. The 
arrangement of the voluminous heterogeneous data into relatively 
homogeneous groups or classes according to their points of simi. 
larities introduces homogeneity or uniformity amidst diversity and 
makes it more intelligible, useful and readily amenable for further 
processing like tabulation, analysis and interpretation of the data. - 


32.2. Rules for Classification. Although classification is. 
one of the most important techniques for the statistical treatment 
and analysis of numerical data, no hard and fast rules can be laid 
down for it. Obviously, a technically sound classification of the 
data in any statistical investigtaion will primarily depend on the 
nature of the data and the objectives of the enquiry. However, 
consistent with the nature and objectives of the enquiry, the 
following general guiding principles may be observed for good 
classification : 

(i) It should be un-ambiguous. The classes should be rigidly 
defined so that they should not lead to any ambiguity. In other 
words, there should not be any room for doubt or confusion те- 
garding the placement of the observations in the given classes. For 
example, if we have to classify a group of individuals as employed 
and un-employed’: or ‘literate’ and ‘illiterate’ it is imperative to 
define in clear cut terms asto what we mean by an employed 
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person and un-employed person ; by a literate person and illiterate 
person. 


(ii) It should be exhaustive and mutually exclusive. The classi- 
fication must be exhaustive in the sense that each and every item 
in the data must belong to one of the classes. A good classification 
should be free from the residual class like ‘others’ or ‘miscellaneous’ 
because such classcs do not reveal the characteristics of the data 
completely. However, if the classes are very large in number as is 
the case in classifying various commodities consumed by people in 
a certain locality, it becomes necessary to introduce this ‘residual 
class’ otherwise the purpose of classification viz., condensation of 
the data will be defeated, 


Further, the various classes should be mutually disjoint or 
non-overlapping so that an observed value belongs to one and only 
one of the classes. For instance, if we classify the students in a col- 
lege by sex, ?.e., as males and females, the two classes are mutually 
exclusive. But if the same group is classified as males, famales and 
addicts to a particular drug then the classification is faulty because 
the group “‘addicts to a particular drug" includes both males and 
females, However, in sucha case, a proper classification will be 
w.r.t. two criteria viz., w.r.t. sex (males and females) and further 
dividing the students in each of these two classes into *addicts' and 
*non-addicts' to the given drug. 


(iii) It should be stable. In order to have meaningful com- 
parisons of the results, an ideal classification must be stable t.e. the 
same pattern of classification should be adopted throughout the 
analysis and also for further enquiries on the same subject. For ins- 
tance, in the 1961 census, the population was classified w.r.t. pro- 
fession in the four classes viz. (i) working as cultivator, (ii) working 
as agricultural labourer, (ii) working at household industty and 
(iv) others, However, in 1971 census, the classification w.r.t. 
profession was as under : 


(a) Main Activity : 
(i) Worker [Cultivator (C), Agricultural labourer (AL), House: 
hold industries (HHZ), Other works (OW)]. 


(b) Broad Category. Non-worker [Household duties (Н); 
Student (ST) ; Renteer or Retired person (R) ; Dependent, Beggers, 
Institutions and Others (DBIO]]. 

Consequently the results obtained in the two censuses cannot 
be compared meaningfully. Hence, having decided about the basis 
of classification in an enquiry, we should stick to it for other related 
matters in order to have meaningful comparisons. 

(iv) Tt should be suitable for the purpose. The classification 
must be in keeping with the objectives of the enquiry. For instance, 
if we want to study the relationship between the university edu- 
cation and sex, it will be futile to classify the students w.r.t. to age 
and religion. 
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(v) It should be flexible. А good classification should be flexi- 
ble in that it should be adjustable to the new and changed situations 
and conditions. No classification is good enough to be used for 
ever ; changes here and there become necessary with the changes in 
time and changed circumstances. However, flexibility should not 
be interpreted as instability of classification. The classification can 
be kept flexible by classifying the given population into some major 
groups which more or less remain stable and allowing for adjust- 
ment due to changed circumstances or conditions by sub-dividing 
these major groups into sub-groups or sub-classes which сап be made 
flexible, Hence the classification can maintain the character of flexi- 
bility along with stability. s 

3.2.3. Bases of Classification. The bases or the criteria 
qp. which the data are classified primarily depend on the objecti- 
ves and the purpose of the enquiry. Generally, the data can be 
classified on the following four bases : 

(i) Geographical i.e., Area-wise or Regional. 

(ii) Chronological i.e., w.r.t. occurrence of time. 

(iii) Qualitative i.e., w.r.t. some character or attribute. 

(iv) Quantitative t.e., w.r.t. numerical values or magnitudes. 

In the following section we shall briefly discuss them one by > 
one. 

Geographical Classification, As the name suggests, in this 
classification the basis of classification is the geographical or loca- 
tional differences between the various items in the data like States, 
Cities, Regions, Zones, Areas etc. For example, the yield of agricul- 
tural output per hectare for different countries in some given period 
or the density of the population (per square km.) in different cities 
of India, is given in the following tables. 

TABLE 3.1 TABLE 3.2 
AGRICULTURAL OUTPUT DENSITY OF POPULATION 
OF DIFFERENT COUNTRIES (Per Square Kilometre) 
[In Kg, Per Hectare] IN DIFFERENT CITIES OF 


INDIA 
pen ot POUA EN 
685 


Calcutta 


654 
423 


Bombay 
Deihi 
Madras 
Chandigarh 


Source: Yojna; Vol. XV, No. 18. 
19th Sept. 1971, page 22. 
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In the above classification, the daily earnings of the stores are 
termed as variable and the number of stores in each class as the 
frequency. The above classification is termed as grouped frequency 
distribution, 


Variable. As already pointed out, the quantitative pheno- 
menon under study, like marks in atest, heights or weights of the 
students in a class, wages of workers in a factory, sales іп a depart- 
mental store etc., is termed a variable or a variate. It may be noted 
that different variables are measured in different units e.g., age is 
measured in years, height in inches or cms ; weight in Ibs. or kgs., 
income in rupees and so оп. 


Variables are of two kinds : 
(i) Continuous variable. 
(ii) Discrete variable (Discontinuous variable). 


Those variables which can take all the possible values (integral 
as well as fractional) in a given specified range are termed as conti- 
nuous variables. For example, the age of students in a school 
(Nursery to Higher Secondary) is a continuous variable because age 
can take all possible values (as it can be measured to the nearest 
fraction of time : years, months, days, minutes, seconds etc.), in a 
certain range, say, from 3 years to 20 years. Some other examples 
of continuous variable are heights (in ems), weight (in Ibs.), distance 
(in kms). More precisely a variable is said to be continuous if it is 
capable of passing from any given value to the next value by. infini- 
tely small gradations. 


On the other hand those variables which cannot take all the 
possible values within a given specified range are termed as discrete 
(discontinuous) variables. For example, the marks in a test (out of 
100) of a group of students is a discrete variable since in this case 
marks can take only integral values from 0 to 100 (or it may take 
halves or quarters also if such fractional marks are given. Usually, 
fractional marks, if any, are rounded to the nearest integer), It 
cannot take all the values (integral as well as fractional) from 0 to 
100. Some other examples of discrete variable are family size 
(members in a family), the population of a city, the number of 
accidents on the road, the number of typing mistakes per page and 
soon. А discrete variable is, thus, characterised by jumps and gaps 
between the one value and the next. Usually, it takes integral values 
in a given range which depends on the variable under study. A 
detailed discussion of the quantitative classification of a series or set 
of observations is given in the $ 3.3 “Frequency Distributions". 


Remark. Values of the variable in a 'given specified range' 
are determined by the nature of the phenomenon under study, In 
case of heights of students in a college the range may be 4' 6” (i.e., 
137 cms.) to 6' 3” (#.е., 190 cms.); in case of weights, it may be 
from 100 lbs. to 200 Ibs., Say ; in case of marks. in a test out of 25, 
it will be 0 to 25 and so on. 
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3.3, Frequency Distribution. The organisation of the 
data pertaining to a quantitative phenomenon involves the following 


four stages : 


(i) The set or series of individual observations—unorganised 
(raw) or organised (arrayed) data. 


(ii) Discrete or ungrouped frequency distribution. 

(iii) Grouped frequency distribution. 

(iv) Continuous frequency distribution. 

We shall explain the various stages by means of a numerical 


illustration, 


Let us consider the following distribution of marks of 200 
students in an examination, arranged serially in order of their roll 
numbers, 


TABLE 3.5 
MARKS OF 200 STUDENTS 


The data in the above form is called tho raw or disorganised 
data. In the raw form the data are so unwieldy and scattered that 
even after.a very careful perusal, the various details contained in 
them remain unfollowed and uncomprehensible. The above presen- 
tation of the data in its raw form does not give us any useful infor- 
mation and is rather confusing to the mind. Our objective will be 
to express the huge mass of data in a suitable condensed form whic^ 
will highlight the significant facts and comparisons and furnish more: 
useful information without sacrificing any information of interest 
ab out the important characteristics of the distribution. - 
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In the above classification, the daily earnings of the stores are 
termed as variable and the number of stores in each class as the 
frequency. The above classification is termed as grouped frequency 
distribution, 


Variable. As already pointed out, the quantitative pheno- 
menon under study, like marks in a test, heights or weights of the 
students in a class, wages of workers in a factory, salesin a depart- 
mental store etc., is termed a variable or a variate, It may be noted 
that different variables are measured in different units e.g., age is 
measured in years, height in inches or cms ; weight in Ibs. or kgs., 
income in rupees and so on. 


Variables are of two kinds : 
(i) Continuous variable. 
(ii) Discrete variable (Discontinuous variable). 


Those variables which can take all the possible values (integral 
as well as fractional) in a given specified range are termed as conti- 
nuous variables. For example, the age of students in a school 
(Nursery to Higher Secondary) is a continuous variable because age 
can take all possible values (as it can be measured to the mearest 
fraction of time : years, months, days, minutes, seconds etc.), in a 
certain range, say, from 3 years to 20 years. Some other examples 
of continuous variable are heights (in cms), weight (in Ibs.), distance 
(їп kms). More precisely a variable is said to be continuous if it is 
capable of passing from. any given value to the next value by infini- 


tely small gradations. 


On the other hand those variables which cannot take all the 
possible values within a given specified range are termed as discrete 
(discontinuous) variables. For example, the marks in a test (out of 
100) of a group of students is a discrete variable since in this case 
marks can take only integral values from 0 to 100 (or it may take 
halves or quarters also if such fractional marks are given. Usually, 
fractional marks, if any, are rounded to the nearest integer), It 
cannot take all the values (integral as well as fractional) from 0 to 
100. Some other examples of discrete variable are family size 
(members in a family), the Population of a city, the number of 
accidents on the road, the number of typing mistakes per page and 
soon. A discrete variable is, thus, characterised by jumps and gaps 
between the one value and the next, Usually, it takes integral values 
in a given range which depends on the variable under study. A 
detailed discussion of the quantitative classification of a series or set 
of observations is given in the § 3.3 “Frequency Distributions". 


Remark. Values of the variable in a ‘given specified range’ 
are determined by the nature of the phenomenon under study. | In 
case of heights of students in a college the range may be 4’ 6 (i.e. 
137 стз.) to 6’ 3” (ie, 190 cms.) ; in case of weights, it may be 
from 100 Ibs. to 200 lbs., зау ; in case of marks. in a test out of 25, 
it wil! be 0 to 25 and so on. 
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3.3. Frequency Distribution. The organisation of the 
data pertaining to a quantitative phenomenon involves the following 
four stages : 


(i) The set or series of individual observations—unorganised 
(raw) or organised (arrayed) data. 


(ii) Discrete or ungrouped frequency distribution. 

(iii) Grouped frequency distribution. 

(iv) Continuous frequency distribution. 

We shall explain the various stages by means of a numerical 


illustration, 


Let us consider the following distribution of marks of 200 
students in ап examination, arranged serially in order of their roll 
numbers, 


TABLE 3.5 
MARKS OF 200 STUDENTS 


The data in the above form is called tho raw or disorganised 
data. In the raw form the data are so unwieldy and scattered that 
even after.a very careful perusal, the various details contained in 
them remain unfollowed and uncomprehensible. The above presen- 
tation of the data in its raw form does not give us any useful infor- 
mation and is rather confusing to the mind. Our objective will be 
to express the huge mass of data in a suitable condensed form whic» 
will highlight the significant facts and comparisons and furnish more 
useful information without sacrificing any information of interest 
ab out the important characteristics of the distribution. 
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3.31. Array. A better presentation of the above raw data 
would be to arrange them in an ascending or descending order of 
magnitude which is called the ‘arraying’ of the data. However, this 
presentation (arraying), though better than the raw dato does not 
reduce the volume of the data. 


3.32. Discrete or Ungrouped Frequency Distribution. 

A much better way of the representation of the data is to express it 

in the form of a discrete or ungrouped frequency distribution where we 

count the number of times each value of the variable (marks in the 

above illustration) occurs in the above data. This is facilitated 

through the technique of Tally-Marks or Tally-Bars as explained 
ow : 


In the first column we place all the possible values of the vari- 
able (marks in the above case). In the second column a vertical bar 
(1) called the Tally Mark is put against the number (value of the 
variable) whenever it occurs. After a particular value has occurred 
four times, for the fifth occurrence we put a cross tally mark ( / ) on 
the first four tally marks like |H] to give us a block of 5. When it 
occurs for the 6th time we put another tally mark against it (after 
leaving some space from the first block of 5) and for the 10th occur- 
rence we again put a cross tally mark (/) on the 6thto 9th tally 
marks to get another block of 5andsoon. This technique of put- 
ting cross tally marks at every 5th repetition (giving groups of 5 each) 
facilitates the counting of the number of occurrences of the value at 
the end. In the absence of such cross tallies marks we shall get 
continuous tally bars like |||||l|...and there may be confusion in count- 
ing and we are liable to commit mistakes also. Thus the 2nd column 
consists of tally marks or tally bars. After putting tally marks for 
all the values in the data, we count the number of times each value 
is repeated and write it against the corresponding number (value of 
the variable) in the third column, entitled frequency. This type of 
representation of the data is called discrete or ungrowped frequency 
distribution. The marks (which vary from student to student) are 
called the variable under study and the number of students against 
the corresponding marks (which tell us how frequently the marks 
occur) is called the frequency ( f ) of the variable. The Table 36 
gives the ungrouped frequency distribution of the data in Table 3'5, 
along with the tally marks, 


From the frequency table on page 71 we observe that there are 
8 students getting 38 marks, 14 students getting 42 marks, only 1 
student getting 75 marks and so on, The presentation of the data in 
the form of an ungrouped frequency distribution as given on page 71 
is better way than ‘arraying’ but still it does not condense the data 
much and is quite cumbersome to grasp and comprehend. The un- 
grouped frequency distribution is quite handy (i) if the values of the 
variable are largely repeated otherwise there will be hardly any con- 
densation or (ii) if the variable (X) under consideration takes only а 
few values, say, if X were the marks out of 10 in a test given to 200 
students, then X is a variable taking the values in the range 0 to 10 
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TABLE 3.6 
MARKS OF 200 STUDENTS 


Marks Tally Bars Frequency Marks Tally Bars Frequency 


i 


ка ке МӘ ка VNNENH WE WWWOAUADAAUWUA 


2 
2 
5 
1 
i 
1 
2 
2 
2 
2 
2 
| 2 
| 3 
| 2 
| 3 
| 3 
| 4 
8. 
7 
J 
5 
3 
7 
8 
9 
6 
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and can be conveniently represented by an ungrouped frequency 
distribution. However, if the variable takes the values ina wide 
(large) range as in the above illustration in Table 3.6 the data still 
remain unwieldy and need further processing for statistical analysis. 


3.3.3. Grouped Frequency Distribution. If the identity 
ofthe units (studentsin our example) about whom a particular 
information is collected (marks in the above illustration) is not 
relevant nor is the order in which the observations occur, then the 
first real step of condensation consists in classifying the data into 
different classes (or class intervals) by dividing the entire range of 
the values of the variable into a switable number of groups called 
classes and then recording the number of observations in each group 
(or class). Thus, in the above data of Table 3.6, if we divide the 
total range of the values of the variable viz, 78—15=63 into groups 
of size 5 each, then we shall get 63/5—13 groups and the distri- 
bution of marks is then given by the following grouped frequency 
distribution. 
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TABLE 3.7 
MARKS OF 200 STUDENTS 


No. of Students 
(f) 


Я The various groups into which the values of the variable аге 
classified are known as classes or class intervals ; the length of the 
class interval (which is 5 in the above case) is called the width or 
magnitude of the classes. The two values specifying the class are 
called the class limits ; the bigger value, the upper class limit and 
the smaller value, the lower class limit. 

3.3.4, Continuous Frequency Distribution. While dealing 
with a continuous variable it isnot desirable to present the data 
into a grouped frequency distribution of the type given in Table 3-7. 
For example, if we consider the ages ofa group of students in a 
school, then the grouped frequency distribution into the classes 4—6, 
7—9, 11—13, 14—16 etc., will not be correct, because this does not 
take into consideration the students with ages between 6 and 7 years 
$e, 6<X<7; 9 and 10 years ie. 9<X<10 and soon. In such 
situations we form continuous class intervals, (without any gaps), 
of the following type : 

Age in years : 

Below 6 

6 or more but less than 9 

9 or more but less than 12 

12 or more but less than 15 
and so on, which takes care ofall the students with any fractions 
of age. 

The presentation ofthe data into continuous classes of the 
above type along with the corresponding frequencies is known as 
continuous frequency distribution, [For further detailed discussion, 
see Types of Classes— Inclusive and Exclusive—$ 3.4.4, page 76]. 


3.4. Basic Principles for Forming a Grouped Frequency 
Distribution. In spite of the great importance of classification in 
Statistical analysis, no hard and fast rules can be laid down for it. 
A statistician uses his discretion for classifying а frequency distri- 
bution, and sound experience, wisdom, skill and aptness are required 
for an appropriate classification of the data. However, the following 
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general guidelines may be borne in mind for a good classification of 
the frequency data. 


3.4.1. Types of Classes, The classes should be clearly defined 
and should not lead toany ambiguity. Further, they should be 
exhaustive and mutually exclusive (i.e., non-overlapping) so that 
any value of the variable corresponds to one and only one of the 
classes. In other words, there is one to one correspondence between 
the value of the variable and the class. 


. 3.42. Number of Classes. Although no hard and fast rule 
exists, a choice about the number of classes (class intervals) into 
which a given frequency distribution can be divided primarily 
depends upon : 

(i) The total frequency (i.e. total number of observations in: 
the distribution). 


(ii) The nature of the data é.e., the size or magnitude of the 
values of the yariable. 


(iii) The accuracy aimed at, and 


(iv) The ease of computation of the various descriptive 
measures of the frequency distribution such as mean, variance, etc. 
for further processing of the data. 


However, from practical point of view the number of classes 
should neither be too small пог too large. If too few classes аге 
used the classification becomes very broad and rough in the sense ' 
that too many frequencies will be concentrated or crowded in a 
single class. This might obscure some important features and charac- 
teristics of the data, thereby resulting in loss of information, Моге- 
over, with too few classes the basic assumption that class marks 
(i.e,, mid-values of the classes) are representative of the class for 
computation of further descriptive measures of distribution like 
mean, variance, etc., will not be valid, and the so-called grouping 
error will be larger in such cases. Consequently, in general, the 
accuracy of the results decreases as the number of classes becomes 
smaller and smaller. On the other hand, too many classes i.e. large 
number of classes will result in too few frequencies in each class, 
This might give irregular pattern of frequencies in different classes 
thus making the frequency distribution (frequency polygon) 
irregular, Moreover a large number of classes will render the distri- 
bution too unwieldy to handle, thus defeating the very purpose (or 
aim, viz., summarisation of the data) of classification. Further the 
computational work for further processing of the data will unneces- 
sarily become quite tedious and ‘time consuming without any pro- 
portionate gain in the accuracy of the results. However, a balance 
should be struck between these two factors, viz., the loss of inform- 
ationin the first case (i.e., too few classes) and irregularity of 
frequency distribution in the second case (i.e., too many classes) to 
arrive at a pleasing compromise, giving the optimum number of 
classes in the view of the statistician. Ordinarily, the number of 
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classes should not be greater than 20 and should not be less than 5, 
of course keeping in view the points ($) to (iv) given above together 
with the magnitude of class interval, since the number of classes is 
inversely proportional to the magnitude of the class interval. 


A number of rules of the thumb have been proposed for calcu- 
lating the proper number of classes, However, an elegant, though 
approximate formula seems to be one given by Prof. Sturges known 
as Sturges rule, according to which 


k—14-3.322 log N (3.1) 


where & is the number of class intervals (classes) and № is the total 
frequency i.e., total number of observations in the data, The value 
obtained in (3.1) is rounded to the next higher integer. 


Since log of one digited number is 0.(...) ; log of two digited 
number is 1. (...) ; log of three digited number is 2,(...) and log of 
four digited number is 3.(...), the use of formula (3.1) restricts the 
value of k, the number of classes, to be fairly reasonable. For 
example : 


If N—10, k=1-+3.322 log 10—4:322e4 
If N=100, k=1+3.322 log 100—14-3:322 x 2 1061010 
—14-6.644—7:6442:8 
If N=500, &=1+-3.322 101500 


=1+3.322 x 2°6990=1+8'966= 10 


IfN—1000, , k—14-3.322 108101000 
—14-3.322 х3 1081010 
=149.966=10°966=11 

IfN=10000, ^ k—14-3.322x4 
—14-13.288—14:288214 


Accordingly the Sturges formula (3.1) very ingeniously restricts 
the number of classes between 4 and 20, which is a fairly reasonable 
number from practical point of view. 


The rule, however, fails if the number of observations is very 
large or very small. 


Remarks. 1. The number of class intervals should be such 
that they usually give uniform and unimodal distribution in the sense 
that the frequencies in the given classes first increase steadily, reach 
а maximum and then decrease steadily. There should not be any 
sudden jumps or falls which result in the so-called irregular distri 
bution, The maximum frequency should not occur in the very 
beginning or at the end of the distribution nor should it (maximum 
frequency) be repeated in which cases we shall get an irregular 
distribution. 


2. The number of classes should be a whole number (integer) 


preferably 5 or some multiples of 5, 10, 15, 20, 25, etc., which are 
readily perceptible to the mind and are quite convenient for nume- 


T 


| 
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rical computations in the further processing (statistical analysis) of 
the data. Uncommon figures like 3, 7, 11 RE should be ded as 
far as possible, 
А 3.4.3. Size of Class Intervals. Since the size of the class 
interval js inversely proportional to the number of classes (class 
intervals) in a given distribution, from the above discussion itis 
obvious that a choice about the size of the class interval will also 
largely depend on the sound subjective judgement of the statistician 
keeping in mind other considerations like W (total frequency), 
nature of the data, accuracy of the results and computational ease 
for further processing of the data. Here an approximate value of the 
magnitude (or width) of the class interval, say, ‘t’ can be obtained 
by using Sturges’ rule (3.1) which gives : 
im Range 
Number of Classes 
Range " 

> шв 

13 322 logi V [Using (8-2) 
where Range of the distribution is given by the difference between 
the largest (L) and the smallest (S) value in the distribution. 


i.e., Range=Xmas—Xmin=L—S ...(3.2) 
ЗАБАВЕ: iE 68:8) 


15237320 logo N 
Another ‘rule of the thumb’ for determining the size of the class 
interval is that ‘The length of the class interval should not be greater 
than № of the estimated population standard deviation.”* Thus if 


Sis the estimate of the population standard deviation then the 
length of class interval is given by 


#<5/4=А, (say). (3.4) 
Remarks 1. Incidentally (34) also enables as to have an 
idea about the minimum number of classes (k) which will be given 


au Range 
a ne ...(3.5) 


where range is defined in (3.2) and A=3/4. 

If we consider a hypothetical frequency distribution of the life 
time of 400 radio bulbs tested at a certain company with the result 
that minimum life time is 340 hours and maximum life time is 
1300 hours such that, in usual notations : 

N=400, L=1300 hrs., S=340 hrs., then using (3 3), we get: 


j= 1300-340 _ 960 
13.355 1060000 — 1-3 322x2.6021 
AL 960 ^. 99.54e«100 NC) 


—pEB644 9.644 
* For detailed discussion on “Standard Deviation" see Chapter 6 
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If the magnitude of the class interval is taken as 100, then the 
number of classes will be 10 [which is nothing but the value 9.644c 
10 in the denominator of (*)]. 


2. Like the number of classes, as far as possible, the size of 
class intervals should also be taken as 5 or some multiple of 5 viz., 
10, 15, 20, etc., for facilitating computations of the various descrip- 
tive measures of the frequency distribution like mean (X), standard 
deviation (c), moments, etc. 


3. Class intervals should be so fixed that each class has a 
convenient mid-point about which all the observations in the class 
cluster or concentrate. In other words, this amounts to saying that 
the entire frequency of the class is concentrated at the mid-value of 
the class, This assumption will be true only if the frequencies of 
the different classes are uniformly distributed in the respective class 
intervals. This is a very fundamental assumption in the statistical 
theory for the computation of various statistical measures, like 
mean, standard deviation, etc. 


4. From the point of view of practical convenience, as far as 
possible, it is desirable to take the class intervals of equal or 
uniform magnitude throughout the frequency distribution, This will 
facilitate the computations of various statistical measures and 
also result in meaningful comparisons between different classes 
and different frequency distributions. Further, frequency distri- 
butions with equal classes can be represented diagrammatically 
with greater ease and utility whereas in the case of classes with 
unequal widths the diagrammatic representation might give a dis- 
torted picture and thus lead to fallacious interpretations. However, 
it may not be practicable nor desirable to keep the magnitudes of 
the class intervals equal if there are very wide gaps in the observed 
data e.g., in the frequency distribution of incomes, wages, profits, 
savings, etc, For example, in the frequency distribution of income, 
larger class intervals would (obscure) sacrifice all the details about 
the smaller incomes and smaller classes would give quite an un- 
wieldy frequency distribution. Such distributions are quite common 
in many economic and medical data, where we have to be content 
with classes of unequal width. 


3.4.4, Types of Class Interval, As already stated, each 
class is specified by two exterme values called the class limits, the 
smaller one being termed as the lower limit апа the larger one the 
upper limit of the class. The classification of a frequency distribu- 
tion into various classes is of following types : 


(a) Inclusive Type Classes. The classes of the type 30—39, 
40— 49, 50—59, 60—69, etc.,in which both the upper and lower 
limits are included are called “inclusive classes”. For instance. the 
class interval 40—49 includes all the values from 40 to 49, both 
inclusive. The next value, viv. 50 is included in the next class 
50—59 and so оп, However, the fractional values between 49 and 
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50 cannot be accounted for in such a classification. Hence ‘Inclusive 
Type’ of classification may be used for а groupen frequency distri- 
bution for discrete variable like marks in a test, number of accidents 
on the road, etc., where the variable takes only integral values. It 
cannot be used with advantage for the frequency distribution of 
continuous variable like age, height, weight etc., where/all values 
(integral as well as fractional) are permissible. 4 


. (b) Exclusive Type Classes. Let us consider the distribu- 
tion of ages of a group of persons into classes 15—19, 20—24, 
25—29, etc, each of magnitude 5. This classification of ‘inclusive 
type’ for ages is defective inthe sense that it does not account for 
the individuals with ages more than 19 years but less than 20 years. 
In such a situation (where the variable is continuous), the classes 
have to be made without any gaps as given below : 


NH 


15 years and over but under 20 
20 years and over but under 25 
25 years and over but under 30 
and so on ; each class in this case also being of magnitude 5. More 
pecisely the above classes can be written as : 


15—20 15<X<20 
20—25 i.e. 20€ X «25 И) 
25—30 ФЕ 25« X «30 


and so on, where it should be clearly understood that in the above 
classes, the upper limits of each class are excluded from the respec- 
tive classes. Such classes in which upper limits are excluded from 
the respective classes and are included in the immediate next class 
are termed as ‘exclusive classes’. 


Remarks 1. For ‘exclusive classes’ the presentation given in 
(*) is preferred since it does not lead to any confusion. However, 
if presentation (**) is used, there is slight confusion about the over- 
lapping values viz., 20, 25, 30, etc., but whenever such presentation 
is used (which is extensively done in practice) it should be clearly 
understood that the upper limit of the class is to be excluded from 
that class. From the above discussion it is also clear that a choice 
between the ‘inclusive method’ or ‘exclusive method?’ of classifica- 
tion will depend on the nature of the variable under study. Fora 
discrete variable, the ‘inclusive classes’ may be used while for 
continuous variable the ‘exclusive classes’ are to be used, 


2. However, sometimes, even for a continuous random vari- 
able the classification may be given to be of ‘inclusive type’. As an 
illustration, let us consider the following frequency distribution of 
age of a group of 50 individuals : 
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Age (on last birth day) No of persons (f ) 


Although the variable (age) X is a continuous variable, here 
inclusive type of classes are used since we are recording the age as 
on last birthday and consequently it becomes a discrete variable 
taking only integral values, Since age is a. continuous variable, we 
might like to convert this ‘inclusive type’ classification into ‘exclusive 
type’ classification. Since the ages are recorded as on last birthday, 
they are recorded almost one year younger (prior). For example 
in the age group 20—24, there may be a person (or many persons) 
with ages 24.1, 24.2,...... upto 24.99. Thus all these persons who 
have not yet completed 25 years will be taken in the age group 
20—24. Hence for obtaining ‘exclusive classes’ we can make a 
correction in the above distribution by converting 24 to 25. Accord- 
ingly for continuous representation of data (exclusive type), all the 
upper class limits will have to be increased by 1, thereby giving the 
following (exclusive type) distribution. 


Age (on last birth day) No. of persons 
(Х) (Р) 


However, if the variable Х is taken to denote the ‘age on next 
birthday’, then it would imply that the ages are recorded опе уеаг 
advance (i.e., one year older than existing one). This will mean 
that the class 20—25 may include person(s) with ages just higher 
than 19 also. Assuch for continuity of the data the lower limit 
will have to be reduced by 1. Hence to obtain the ‘exclusive type 
classification for this case (X-age on next birthday i.e., coming 
birthday), we shall have to subtract 1 from the lower limit of each 
class to get the following distribution : 
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Age (on next birthday) No. of persons 
(7) 0) 


3. As far as possible the class limits should start with zero or 
some convenient multiple of 5. As an illustration if we want to 
forma frequency distribution of wages in a factory with class 
interval of 10 and the lowest value of wages (per week) is given to 
be Rs. 43, then instead of having classes 43—53, 53—63,...etc., a 
proper classification should be 40—50, 50—60, etc. 

4. Class Boundaries. If in a grouped frequency distribution 
there are gaps between the upper limit of.any class and lower limit 
of the succeeding class (as in the case of inclusive type of classifica- 
tion), there is need to convert the data into a continuous distribu- 
tion by applying a correction for continuity for determining new 
classes of exclusive type. The upper and lower class limits of the 
new ‘exclusive type’ classes as called class boundaries, 

If d is the gap between the upper limit of any class and lower 
limit of the succeeding class, the class boundaries for any class are 
then given by ; 

Upper class boundary —Upper class limit--3d } (3.6) 

Lower class boundary —Lower class limit—d cis 

d[2 is called the correction factor. ^ 

As an illustration, consider the following distribution of 
marks : 


Class Boundary 


Here, d—35—34—1 = 2 =05 


This technique enables us to convert a grouped frequency 
distribution (inclusive type) into continuous frequency distribution 
and is extensively helpful in computing certain statistical measures 
like mode, median, etc., [See Chapter 5] which require the distribu- 

' tion to be continuous. 

Thus in the above table, the lower class limits are 20, 25, 30, 
1140 and upper class limits are 24, 29,...,44, while the lower class 
boundaries are 19.5, 24.5,...,39.5 and the upper clas. boundaries 


are 24.5, 29.5,...,44-5. 
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5. Mid-value or Class Mark. As the name suggests, the 
mid-value or the class-mark is the value of the variable which is 
exactly at the middle of the class. The mid-value of any class is 
obtained on dividing the sum of the upper and lower class limits (or 
class boundaries) by 2, In other words : 


Mid-value of a class=} [Lower class limit-- Upper class limit] 
={ [Lower class boundary 
+ Upper class boundary]...(3.7) 
Tn the table of remark 4 above, it may be seen that the mid- 
values of various classes are : 22, 27, 32, 37, 42 respectively as given. 
below : 


3 (204-24)—22 3 (19.5--24:5)—22 
+ (25+29)=27 3 (24.5--99:5)—27 
4 (30+34)=32 + or 1(29.5--34:5)—32 
1 (35+39)=37 3 (34.5-1-39:5) —37 
3 (404-44) —42 3 (89.5-I-44:5) —42 


It may be noted that whether we use class limits or class boun- 
daries, the mid-values remain same. 


Important Note. For fixing the class limits the most impor- 
tant factor to be kept in mind is as given below : 


“The class limita should be chosen in such a manner that the 
observations in any class are evenly distributed throughout the class 
interval во that the actual average of the observations in any class is 
very close to the mid-value of the class. In other words, this amounts to 
saying that the observations are concentrated at the mid points of the 
classes." 


This is a very fundamental assumption in preparing a grouped 
or continuous frequency distribution for computation of various 
statistical measures like mean, variance, moments, etc., [See Chap- 
ters 5, 6, 7] in further analysis of the data, If this assumption is not 
true then the classification will not reveal the main characteristics 
and thus give a distorted picture of the distribution. The deviation 
from this assumption introduces the so-called ‘grouping error’. 


(с) Open End Classes. The classification is termed as ‘open 
end classification’ if the lower limit of the first class or the upper 
limit of the last class or both are not specified and such classes in 
which one of the limits is missing are called ‘open end classes’. For 
example, the classes like the marks less than 20 ; age above 60 years, 
salary not exceeding Rupees 100 or salaries over Rupees 200, etc., 
are ‘open end classes’ since one of the class limits (lower or upper) 
is not specified in them. As far as possible, open end classes should 
be avoided since in such classes the mid-value or class mark cannot 
be accurately obtained and this poses problems in the computation 
of various statistical measures for further processing of the data. 
Moreover, open end classes present problems in graphic presentation 
of the data also. 
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However, the use of open end classes is inevitable or un-avoid- 
able їп а number of practical situations, particularly relating to 
economic and medical data where there are a few observations with 
extremely small or large values while most of the other observations 
are more or less concentrated in a narrower range. Thus we have 
to resort to open end classes for the frequency distribution of in- 
comes, wages, profits, payment of income tax, savings etc. 


Remark. In case of open end classes, it is customary to esti- 
mate the class mark or mid-value for the first class with reference to 
the succeeding class (i.e , 2nd class). In other words, we assume that 
the magnitude of the first class is same as that of second class. 
Similarly the mid-value of the last class is determined with reference 
to the preceding class i.e., last but one class. This assumption will, 
of course, introduce some error in the calculation of further statisti- 
cal measures (averages, dispersion, etc.—See Chapters 5, 6). How- 
ever, if only a few items fall in the open end classes then : 


_ (i) there won't be much loss in information in further pro- 
cessing of data as a consequence of open end classes, and 


(ii) the open end classes will not seriously reduce the utility 
of graphic presentation of the data. 


Example 3.1. Form a frequency distribution from the follow- 
ing data by Inclusive Method taking 4 as the magnitude of class- 
intervals : 


101 712,5 Ton ee ses 205 19, 24, 29, 18 
25, 26, 32, 14, 17, 20, 23, 27, 80, 12 
15, 18, 24, 36, 18, 16, 21, 28, 38, 88 
34, 13, 10, 16, 20, 22, 29, 19, 28, 91 
(Delhi U , B. Com. 1977) 


Solution. Since the minimum value of the variable is 10 
which is a very convenient figure for taking the lower limit of the 
first class and the magnitude of the class intervals is given to be 4, 
the classes for preparing frequency distribution by the 'Inclusive 
Method' will be 10—13, 14—17, 18--21, 22—25,..., 34—37, 38—41, 
the last class being 38—41, because the maximum value in the distri- 
bution is 38. 


To prepare the frequency distribution, since the first value 10 
occurs in class 10—13 we put a tally mark against it. for the value 
17 we put a tally mark against the class 14—17 ; for ‘he value 15 
we put a tally mark against the class 14—17 and so cn. The final 
frequency distribution along with the tally marks is g ven below : 
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Example 3.2. Following figures relate to the weekly wages of 
workers in a factory. 


Prepare a frequency table by taking a class interval of б. 
(Delhi U. B. Com. 1974) 


Solution, In the above distribution, the minimum value of 

the variable X (Wages in Rupees) is 75 and maximum value is 110. 

. Moreover, the magnitude of the class intervals is given to be 5. 
Since ‘wages’ is a continuous variable, the frequency distribution 


FREQUENCY DISTRIBUTION OF WAGES OF WORKERS 
IN A FACTORY b: 


Wages (in Rs.) (X) Tally Marks No. of Workers (f) 
75— 80 
80— 85 He ih Il 12 
85— 90 wn i IHI 15 
90— 95 Win t il 
95—100 wn n n 20 
105—110 rna it 
= | 
110—115 p i 2 


Total=100 
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with ‘Exclusive Method’ would be appropriate. Since the minimum 
value 75 is a convenient figure to be taken as the lower limit of the 
first class, the class intervals may be taken as 75—80, 80—85, | 
85—90...., 110—115, the upper limit of each class being included 
in the next class. The frequency distribution in given on page 82. 


Example 3.3. Prepare a frequency distribution of the number 
of nies in a word from the following excerpt (ignore punctuation 
marks), 


Ni: the beginning", said a Persian Poet, “Allah took a rose, a 
lilly, a dove, a serpent, a little honey, @ Dead Sea Apple and a handful 
of clay. When he looked at the amalgam—it was а woman." 


Also obtain (i) the number of words with 6 letters or more ($i) the 
proportion of words with 5 letters or less and (iii) the percentage of 
words with number of letters between 2 and 8 (i.e., more than 2 but less 


than 8). 


Let X denote the number of letters in each word 
en above. We note that in the above excerpt there 
f letters ranging from 1 to 9. Hence X 
ple in the first word ‘In’ there 


Solution. 
in the excerpt giv 
are words with number o 
takes the values from 1 to 9. For exam) 
are 2 letters ; in the second word ‘the’ there are 3 letters ; in the 
third word ‘beginning’ there are 9 letters and so on. Thus the corres- 
ponding values of the variable X in the above excerpt are as given 


below : 


Зкө А Адн А, МА 
1 54/0]; ЛЕЕ А ЕАН 55950: 
ОВ НИ I 


The frequency distribution along with the tally marks is given 
in the table on page 84. 


(i) The number of words with 6 letters or more=2+4+1=7 


(ii) The proportion of words with 5 letters or less is given 


by 


9-L5--5--9--4 _ 32 ) 
9-E5--5--9--4- 920; 
c 39 59-002 


(iii) The percentage of words with the number of letters bet- 


ween 2 and 8 is 


24 | 100—61.45 
39 TREA 


82 Business Statistics, 


FREQUENCY DISTRIBUTION OF NUMBER OF 
LETTERS IN A WORD 


Numbers of Letters Tally Marks Frequency 
in a word (X) 


4 
0 


wo 0o м с nA R YN ы 


І xample 3.4. Ina survey, it was found that 64 families bought 
milk in the following quantities in a particular month. 


19 16 22 9 22 12 39 49.5 14:28 
6 24 16 18 7 17 20 25 28 18 


Using Sturge’s rule, convert the above data into a frequency 
distribution by ‘Inclusive Method’. 


Solution. Here the total frequency N=64. By Sturge’s rule, 
the number of classes (k) is given Ьу: 
k=1+3'322 logy 64 
=]+43°322 x 1'8062 
=1+6'0002=7 
Range=Maximum value— Minimum value 

=39—5=34 

Hence the magnitude (i) of the class is given by 


28 Range x34. T 
S*Nunbeolzume 7 5579. 


Hence taking the magnitude of each class interval as 5, we 
shall get 7 classes, Since the lowest value is 5, whichis quite а 
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convenient figure for being taken as the lower limit of the first class, 
the various classes by the inclusive method would be 

5—9, 10—14, 15—19, 20—24, 25—29, 30—34, 35—39. 


_Using tally marks, the required frequency distribution is 
obtained in the following table. 


FREQUENCY DiSTRIBUTION OF THE MILK 


AMONG 64 FAMILIES 
Milk quantity Tally Marks Number of families 
(С.1.) (f) 
7 


wn 
ww 
Wn II! 
ТИТИ 
Wn t 

wi 


Example 3.5. А college management wanted to give scholar- 
ships to B. Com. students securing 60 per cent and above marks in the 


following manner : 
Percentage of Marks Monthly Scholarship in Rs. 
60—65 25 
65—70 80 
70—75 35 
75—80 40 
80—85 45 


The marks of 25 students who were eligible for scholarship are 
given below : 

74, 62, 84, 72, 61, 83, 72, 81, 64, 71, 63, 61, 60, 67, 74, 66, 64, 

79, 73, 75, 76, 69, 68, 78 and 67. 

Calculate the monthly scholarship paid to the students. 


Solution. As we are given the amount of scholarships accor- 
ding to the percentage of marks of the students within classes 
60—65, 65—70,..., 80—85, we shall convert the given distribution 
of marks into frequency distribution with these classes. 
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FREQUENCY DISTRIBUTION OF MARKS OF 
25 STUDENTS 


Percentage of Tally Marks No. of Scholarship Total Amount 
marks Students (in Rs ) fX 
(Р) (х) 
60—65 1 7 25 175 
65—70 n 5 30 150 
70—75 И! 6 35 210 
75—80 tlt 4 40 160 
80—85 I 3 45 135 
ai SS Se LLL 
Total zf-25 х fX=830 


Total monthly scholarship paid to the students is 
2fX=Rs. 830. 
Example 3.6. ]f the class mid points ina frequency distribu- 
tion of age of a group of persons are 25, 32, 39, 46, 53 and 60, find : 
(t) the size of the class interval 
(81) the class boundaries and 
(iii) the class limits, assuming that the age quoted is the age 
completed last birthday. [Osmania Univ. B. Com. (April) 1978] 
Solution. (i) The size of the class interval 
=Difference between the mid-values of 
y two consecutive classes. 
[Since 32—25—39—32— ...—60—52— 7] 


. (it) Since the magnitude of the class is 7 and the mid-values 
of the classes are 25, 32,..., 60 the corresponding class boundarie- 
for different classes are obtained on adding (for upper class boundas 
ries) and subtracting (for lower class boundaries) half the magni- 
tude of the class interval, viz, 7/2=3-5 from the mid-value res- 
pectively. For example the class boundaries for the first class. will 
be (25—3.5, 25-+3.5) i.e., (21.5, 28.5) ; for the second class will be 
(32—3.5, 3243.5) i.e., (28.5, 35.5) and so on. Thus the various 
classes (Inclusive Type) with class boundaries are as given m the 
following table : 
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(iii) Assuming the age quoted (X) isthe аре completed on 
last birthday then X will bea discrete variable which can take. 
only integral values. Hence the given distribution can be expressed 
in an ‘inclusive type’ of classes with class interval of magnitude dy 
as given in the following table : 


Mid-point 


Age 
(on last birth day) 


(For details see Remark 2 $344). 


Example 3.7. The following table shows the distribution of 
the life time of 350 radio tubes. 


Life time No. of tubes with Lifetime No. of tubes with 


(in hours) life time (in hours) life time 
300—400 6 700—800 62 
400—500 18 800—900 22 
500—600 73 900—1000 4 
600—700 165 


Stating clearly the assumptions involved, obtain the percentage 
of tubes that have life time : 


(i) Greater than 760 hours 
(ii) Between 650 and 850 hours 
(iii) Less than 530 hours. 


Solution. Under the assumption that the class frequencies 
are uniformly distributed within the corresponding classes, we 
obtain by simple interpolation technique : 

(i) Number of tubes with the life time over 760 Hours 

—44-224- (595 x40 )=26+248=508=51, 
since number of tubes cannot be fractional. 


Hence required percentage of tubes 


51 . 
—350 x100—14:57 


(ii) Number of tubes with life time over 650 hours 


—44-22--624- wx 50=88-+82'5=170'5=171 
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Number of tubes with life time over 850 hours 
22 
=4( 2 x50 )=15 


Hence the number of tubes with life time between 650 hours 
and 850 hours is 171—15=156. 


The required percentage of tubes 
156 
= 350% 100= 44.57 


(iii) Number of tubes with life less than 530 hours 
=6+18+ mz X 30=6+18+21.9=45.9=46 


Hence required percentage of tubes 
46 
=350 * 100=13.14 


‚3.5. Cumulative Frequency Distribution. A frequency 
distribution simply tells us how frequently a particular value of the 
variable (class) is occurring. However, if we want to know the 
total number of observations getting a value ‘less than’ or ‘more 
than a particular value of the variable (class), this frequency table 
fails to furnish the information as such. This information can be 
obtained very conveniently from the ‘cumulative frequency distri- 
bution ‘which is a modification of the given frequency distribution 
and is obtained on successively adding the frequencies of the values 
of the variable (or classes) according toa certain law. The fre- 
quencies so obtained are called the cumulative frequencies abbre- 
viated as c.f. The laws used are of ‘less than’ and ‘more than’ type 
giving rise ‘less than cumulative frequency distribution’ and ‘more 
than cumulative frequency distribution’. We shall explain the cons- 
truction of such distributions by means of a numerical illustration. 


dics 
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Let us consider the following distribution of marks of 70 
students in a test : 


Marks No of Students 
30—35 5 
35—40 10 
40—45 15 
45—50 30 
50—55 5 
55—60 5 
Total 70 


35.4. Less Than Cumulative Frequency. Less than 
f the variable (or class) is 


uencies of all the previous 
ncy of variable (class) 


against which the totals are written, provided the values (classes) 


are arranged in ascending order of magnit 
above illustration, the total number of stu 
than, say, 40 is 5+10=15; ‘less than 50° is the sum of all the 
previous frequencies upto and including the class 45—50 t,e. 
5+10-+15+30=60 and so on. The final distribution is given below: 
TABLE 3.7 
‘LESS THAN’ CUMULATIVE FREQUENCY 
DISTRIBUTION OF MARKS OF 70 STUDENTS 


‘Less than’ 


Frequency 
(f) Cumulative Frequency 
(с. 7.) 


5 
5+10=15 


15+15=30 
30+30—60 
50—55 60+ 5=65 
55—60 65+ 5=70 


The above ‘less than’ cumulative frequency distribution can 
also be written as follows : 


Frequency 


Marks 


Less than 30 
35 


oo» 


ae one 145 30 
Res 60 
каср 65 

60 10 


„ж 
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3.5.2. More Than Cumufative Frequency. The ‘more than 
cumulative frequency' is obtained similarly by finding the cumula- 
tive totals of frequencies starting from the highest value of the 
variable (class) to the lowest value (class). Thus in the above 
illustration the number of students with marks ‘more than 50° is 
5+5=10, and ‘more than 40’ is 15-+-30-+5+5=55 and so оп. The 
complete ‘more than’ type cumulative frequency distribution for this 
data is given below: 


TABLE 3.8 


‘MORE THAN’ CUMULATIVE FREQUENCY 
DISTRIBUTION OF MARKS OF 70 STUDENTS 


‘More than’ 


Frequency 
(f) cumulative frequency 
(c. f.) 


65+ 5—70 


35—40 55-10-65 
40—45 40--15—55 
45—50 104-30—40 


5+ 5210 
5 


The above ‘more than’ c.f. distribution can also be expressed in 
the following form : 


No. of students 


More than 30 


Remarks 1. In fact ‘less than’ and ‘more than’ words also 
include the equality sign i,e., ‘less than a given value’ means ‘less 
than or equal to that value’ and ‘more than a given value’ means 
‘more than or equal to that value.’ 


2. Cumulative frequency distribution is of particular impor- 
tance in the computation of median, quartiles and other partition 
values of a given frequency distribution. [For details See Chapter 5 
— Averages]. 


: 3. In ‘less than’ cumulative frequency distribution, „the ed. 
refers to the upper limit of the corresponding class and in 'more 
than’ cumulative frequency dirtribution, the c. f. refers to the lower 
limit of the corresponding class. 


Classification and Tabulation 89 


Example 3.8. Convert the following distribution into ‘more 
than’ frequency distribution. 


Weekly wages less than Ёз. No. of Workers 
———————— 
20 41 
40 92 
60 156 
80 $ 194 
100 201 


(Delhi Univ. В. Сот. 1979) 


Solution. Here we are given, ‘less than’ cumulative frequency 
distribution. To obtain the ‘more than’ cumulative frequency distri- 
bution, we shall first convert it into continuous frequency distribution 
as shown in the following table : 


‘More than’ 
cumulative frequency 


160+41=201 


20—40 109+51=160 
40—60 45+64=109 
60—80 194—156=38 38+ 72 e 


201—194— 7 


Weekly Wages 
More than (Rs.) 


Example 3.9. The credit office of a department store gave 
the following statements for payment due to 40 customers, Construct 


a frequency table of the balances due taking the class intervals as 
Rs. 50 ad s Rs. 200, Rs. 200 and under Rs. 350 etc. Also find 
the percentage cumulative frequencies and interpret these values. 
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Balance due in Rs, 
337, 570, 99, 759, 487, 352, 115, 60, 521, 95 
563, 399, 625, 215, 360, 178, 827, 301, 501, 199 
110, 501, 201, 99, 637, 328, 539, 150, 417, 250 
451, 595, 422, 344, 186, 681, 397, 790, 272, 514 
(Bombay U. B. Com. Nov. 1982) 
Solution, Taking the class intervals as 50—200, 200—350 
РДУ ‚ and using tally marks, we obtain the following distribution 
of the balance due іп Rs. from 40 customers, 


FREQUENCY TABLE OF BALANCE DUE IN 
RUPEES TO 40 CUSTOMERS 


Taliy Marks No. of 
customers 
(f) 


The last column of the ‘percentage cumulative frequencies shows 
that 25% of the customers have to pay less than Rs. 200, 45% of 
customers have to pay less than Rs. 350 ; 65% of the customers have 
to pay less than Rs. 500 ; 90% of the customers have to pay less 
than Rs. 650 ; 97°5% of the customers have to pay less than Rs. 800 
and the balance due is less than Rs. 950 from all the 40 customers 
#.€., no customer has to рау more than Rs, 950. 


3.6. Bivariate Frequency Distribution. So far our study 
was confined to frequency distribution of a single variable only. 
Such frequency distributions are also called univariate frequency dis- 
tributions. Quite often we are interested in simultaneous study of 
two variables for the same population. This amounts to classifying 
the given population w.r.t. two bases or criteria simultaneously. For 
example, we may study the weights and heights of a group of 
individuals, the marks obtained by a group of individuals on two 
different tests or subjects, income and expenditure of a group of 
individuals, ages of husbands and wives for a group of couples etc. 
The data so obtained as a result of this cross classiflcation give rise 
to the so called bivariate frequency distribution and it can be summa- 
rised in the form of two-way table called the bivariate frequency table 
or commonly called the correlation table. Here also the values of 
each variable are grouped into various classes (not necessarily ч с 
same for each variable) keeping in view the same considerations o 
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Classification as for a univariate distribution, If the data corres- 
Ponding toone variable, say, X is grouped into m classes and the 
data corresponding to the other variable, say, Y is grouped into n 
classes then the bivariate table will consist of mx n cells, By going 
through the different pairs of the values (z, y) of the variables and 
using tally marks we can find the frequency for each cell and thus 
obtain the bivariate frequency table. The format ofa bivariate 
frequency table is given below : 


TABLE 3.9 


BIVARIATE FREQUENCY TABLE 


Total of 
Srequencies 
of Y 


Mid Points 


x 


Mid Points 


Total 
Ef, XN 


Total of 
frequencies 
ofX 


Here f(z, y) is the frequency of the pair (z, y). 


Remark The bivariate, frequency table gives a general 
visual picture of the relationship between the two variables under 
consideration, However, a quantitative measure of the linear rela- 
tionship between the variables is given by the correlation coeflicient 
(See chapter 8, Correlation Analysis). 

We shall now explain the technique of constructing bivariate 
frequency table by means of numerical illustrations. 
` Example 3.10. Prepare а bivariate frequency distribution 
for the following data for 20 students : 


Marks in Law 10 11 10 I1 11 14 12 12 18 10 
Marks in Statistics 29 21 22 21 23 23 22 21 24 28 
Marks in Law 13 12 11 12 10 14 14 12 13 10 


Marks in Statistics 24 23 22 23 22 22 24 20 24 98 
E (Delhi Univ. B. Com. 1980) . 


| 
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Solution: Let us denote the marks in Law by the variable 
Х and the marks in Statistics by the variable Y. Then X takes the 
values for 10 to 14 i.e. 5 values in all, and Y takes the values _from 
20 to 24 i.e. 5 values in all, Thus the two-way table will consist of 
5X5=25 cells. 


To prepare the bivariate frequency table, we observe that the 
first student gets 10 marks in law and 20 marks in Statistics. There- 
fore, we put a tally mark in the cell where the column corresponding 
to X=10 intersects the row corresponding to Y=20. Proceeding 
similarly we put tally marks for each pair of values (x, y) for all the 
20 candidates. The total frequency for each cell is given in small 
brackets ( ), after the tally marks. Now count all the frequencies 
in each row and write at extreme right column. Similarly count 
all the frequencies in each column and write at the bottom row. 
The bivariate frequency distribution so obtained is given in the 
following table : 


BIVARIATE FREQUENCY TABLE SHOWING MARKS 
OF 20 STUDENTS IN LAW AND STATISTICS 


Marks in 
Law 


Example 3.11. Following figures give the ages in years of newly 
married husbands and wives. “Represent the data by a frequency 
distribution. 

Age of Husband : 24 26 27 25 28 94 py 98 25 26 

Age of Wife : 17 18 19 17 20 18 18 19 18 19 

Age of Husband : 25 26 27 25 27 96 95 96 96 26 

Age of Wife : 17 18 19 19 20 19 17 20 17 18 

[Delhi Univ. B. Com. (Hons.) 1975] 
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Solution. Let us denote the age (in years) of the husbands by 
the variable X and the age (in years) of wives by the variable Y. 
Then we observe that the variable Y takes the values from 24 to 28 
and Y takes the values from 17 to 20. Proceeding exactly asin the 
above example, we obtain the following bivariate frequency distri- 


bution. 


FREQUENCY DISTRIBUTION OF THE AGES (IN YEARS) 
OF NEWLY MARRIED HUSBANDS AND WIVES 


Age of 24 25 26 27 
Wife 
(Y)+ 


[x he [re fre pa M] 
ко» | ee pe pw] 
p ww [o 75D DC ema ТУ. 
ea e ЕЗЕТ 

Example 3.12. The data given below relate to the marks obtain- 
ed by 20 studenta in two subjects. Prepare a two-way frequency table 


with class-intervals 62—64, 64—66 and so on for A and 115—125, 
125—185, and so on for subject B. 


Serial Marks in Marks in 


Serial Marks in Marks in 
No. No. subject A subject B 


subject A subject B 


© ооо 3с ьш trm 


(Bombay Univ. В. Сот. April 1978) 


Solution. Let the marks in subjects A and B be denoted by 
the variables X and Y respectively and let the various classes of X 
be taken along the ist row and the classes of Y along the Ist column. 
The first pair of marks is (152, 67). Since 152 isin class 145—155 
and 67 lies in class 66—68, we put a tally mark in the cell where the 
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column corresponding to the class 145—155 intersects the row 
66—68. Proceeding similarly for all the pairs of marks of 20 students, 
we obtain the following bivariate frequency table, where values in 


the brackets ( ) after tally marks denote the frequencies of the 
corresponding cells. 


TWO-WAY pict hg TABLE 3HOWING MARKS OF 
20 STUDENTS IN TWO SUBJECTS A AND B 


Total (fz) 


Remark. Here the classification is of ‘Exclusive Type’. The 
upper limit of any class is included in the succeeding class. 


EXERCISE 3.1 


1. (a) What do yop-understand by classification of data? What are its 
objectives ? 


(6) What are the basic principles of a good classification. 


(c) What are different types of classification? Illustrate by suitable 
examples, 


2. “Classification is the process of arranging things (either ашау ор 
notionally) in groups or classes according to their resemblances and а Е 
giving expression to the unity of attributes that may subsist amongst a div 
of individuals”. 


Elucidate the above statement. ti : 
3. What are the objectives of classification ? Discuss different methods 
of classification. [Osmania U. B. Com. (Hons.) April 1983] 


istributii What 
4. (а) What are grouped and ungrouped frequeney distributions ? Whi 
are their uses? What are the considerations that one has to bear in mind while 
forming the frequency distribution ? 
(b) Briefly outline the considerations you will bear in mind while cons- 
tructíng a frequency distribution, 2 
[Delhi U. B. Com. (Hons.) 1982] 
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5. Discuss the problems in the construction of a frequency distribution 
from raw data, with particular reference to the choice of number of classes and 
class limits. 

6. What are the principles governing the choice of ; 

(i) Number of class intervals. 
(ii) The length of the class interval. 
(iii) The mid point of class interval ? 
7. What are the general rules of forming a frequency distribution. with 


articular reference to the choice of class-interval and number of classes ? 
Illustrate with examples. 


8. Prepare a frequency distribution from the following figures relating to 
bonus paid to factory workers : 


BONUS PAID TO WORKERS (in Rs.) 


Take a class-interval of 5, (Delhi U. B. Com. 1976) 
Ans. Frequencies of classes 50—54, 55—59,...,100—104, 105—109 are 
respectively 1, 10, 11, 4, 11, 5, 13, 9, 15, 2, 8 and 1. 
NES RODEO fable gives the purchases made by 50 customers ata 
ntal store. Form istributi i i 
079 99, 10-19-99, 20- 29:09, Баренс distribution, taking the class intervals as 
35-00 8:25 19:00 81:40 52:00 41-25 21.25 32°40 
6000 45:50 49°75 43°75 39°10 220 2500 57'50 
36:30 65°00 30:00 55:00 70:00 38:25 29:25 52.25 
46:50 66°00 36:50 45:25 5:60 56:75 59:00 39°00 
33:50 59.00 4:00 1760 23:50 4450 39:50 79.00 
39:75 20:00 66:50 3650 47:80 37:25 3775 50°00 
2600 42°50 
(Bombay U. B. Com. 1974) 
Ans. 5,2,6,13,9,8, 4,2, 1. 
10. The following are the weights in kilograms of a group of 55 
students. 
44. 4. 0 10 . 82.15. AL 6G 3$. ЗС 69 
53. 10 756. C84. 1 50261.) ДЕ nt 263 065.198 
68 69 104 80 79 79 54 73 59 81 100 
66 49 77 90 84 76 42 64 69 7 80 
22...50.- 79. 11500103196 151-8620 1815941 Т 


Prepare a frequency table taking the magnitude of each class-interval as 
10 kg. and the first class-interval as equal to 40 and less than 50. 
(Delhi U. B. Com. 1977) 
Ans. Frequencies of classes 40—50, 50—60, ..., 110—120 are 5, 7, 11, 15, 
8, 4, 3, 2 respectively. 
11. Prepare a statistical table from the following data taking the class 
width as 7 by inclusive method. 
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€. 236 28 32. 87 5 1 7 9 
13 14 18 29 USE 32 6 4 2 Seis 
27 36 3 933157272 33 4 
16 20 5 10 3 8 1 6 4 9 2 
ОИ О 26-025 15 17 28 
(Delhi U. B. Com. 1978) 


Ans. Frequencies of classes 1—7, 8—14, 15—21, ..., 36—42 are respec- 
tively 15, 12, 11,9, 6,2. 


12. Using Sturges' Rule k=1+-3°322 log N, where kis the number of 
class intervals, N is the total number of observations, classify in equal intervals, 
the following data of hours worked by 50 piece rate workers fora period of a, 
month in a certain factory : 

110, 175, 161, 157, 155, 108, 164, 128, 114, 178, 165, 133, 195, 151, 71, 94, 97, 
42, 30, 62, 138, 156, 167, 124, 164, 146, 116, 149, 104, 141, 103, 150, 162, 
149, 79,113, 69, 121, 93, 143, 140, 144, 187, 184, 197, 87, 40, 122, 203, 148. 

Ans. Using Sturges’ rule we get k (No. of classes)=7, Magnitude of 


Glass=Range/k=174/7=25. Classes are 30—55, 55—80, 80—105, .. 180—205, 
The corresponding frequencies are 3, 4, 6, 9, 12, 11, 5. 


13. Iftheclass mid-points in a frequency distribution -of a group o 
persons are : 125, 132, 139, 146, 153, 160, 167, 174, 181 pounds, find (i) size о 
the class intervals, (ii) the class boundaries, and (iii) the class limits assuming 
that the weights are measured to the nearest pound. 


(Delhi U. B.A. Eco, (Hons.) 1973] 


Ans. ( 7 . (ii) 121:5—128:5, 128:5—135:5,..., 177:5—184:5 
(ili) 122—128, 129—135,..., 178—184. 

14 With the help of suitable examples, distinguish between : 

(i) Continuous and Discrete variable, 

(ii) Exclusive and Inclusive class intervals. 
(ii) ‘More than’ and ‘Less than’ frequency tables. 
(iv) Simple and Bivariate frequency tables, 

(Punjab U. B. Com. Sept. 1980 ; Meerut U. B. Com. April 1977) 


15. What do you mean by cumulative frequency (c. f.) qi outor ; 
“Моге than’ and ‘Less than’ type c.f. distribution; Illustrate by an example, 


16 The weekly observations on cost of living index in a certain city for 
the year 1970-71 are given below : 
Cost of living 
Index : 140—150 150—160 160—170 170—180 180—190 1207200 
No. of workers : 5 10 20 9 6 
Prepare ‘less than’ and ‘more than’ cumulative frequency distributions. 
17. Following is a cumulative distribution table showing the орго 
packages and the number of times a given number of packages was rece 
by a post office in 60 days : 
© Мо. of packages 
below : 10 20 30 40 50 60 
No. of times 
received in 60 days : 17 22 29 37 50 60 : 
Obtain the frequency table from it. Also prepare *more than' cumulative 
ency table. 


i i i d Discrete Vari- 
1 18. (a) What is the difference between Conti ТУ B. Com. 1977) 


a OL) 
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(b) Are the following variables discrete or continuous? Give your 
answer with reason. 
(i) Age on last birthday. 
(ii) Temperature of the patient. 
(iii) Length of a room. 
(iv) Number of shareholders in a company. (Delhi U., B. Com. 1980) 
Ans, (ii) and (iii) continuous ; (i) and (iv) discrete. 
(c) State with reasons which of the following represent discrete data and 
which represent continuous data : 
(i) Number of Tabie Fans sold each day at a Departmental store. 
(ii) Temperature recorded every half an hour of a patient ina hospital. 
(iii) Life of television tubes produced by Electronics Ltd. 
(iv) Yearly income of school teachers. 
(v) Lengths of 1,000 bolts produced in a factory. 
19. Complete the table showing the frequencies with which words of 


different number of letters occur inthe extract reproduced below (omitting 
punctuation marks) treating as the variable the number of letters in each word : 


“Her eyes were blue: blue as autumn distance—blue as the blue we see 
between the retreating mouldings of hills and woody slopes on a sunny Septem- 
ber morning : a misty and shady blue, that had no beginning or surface, and was 
looked into rather than at.” 


Ans EX 5 1623 Со CR РЕС) 
Yin gy See al ЕЗ УЛДА ЧӘ 1 


20 Prepare a frequency distribution of the words in the following 
extract according to their length (number of letters) omitting punctuation 
marks, Also give (i) the number of words with 7 letters or less ; (ii) the propor- 
tion of words with 5 letters or more ; (iii) the percentage of words with not less 
than 4 and not more than 7 letters. 
nfers no absolute right to appointment, 
as may be considered neces- 
for appintment to the public 


“Success in the examination co 
unless Government is satisfied, after such enquiry 
sary, that the candidate is suitable in all respects 
service.” 


Ans. X: 1 РСЕ r4 


(0) 25 (ii) 19/36=0°53 (iii) 11/36 x 100—27:28. 
21 А сотрапу wants to pay bonus to its employees. The bonus is to 
be paid us under : 


Salary (Rs.) Bonus (Rs.) Salary (Rs.) Bonus (Rs.) 
100—200 10 400—500 40 
200—300 20 500—600 50 
300—400 30 600—700 60 


Actual salaries of the employees, in Rupees, are as under : 


175, 225, 375, 478, 525, 650, 570, 451, 382, 280, 
375, 465, 530, 480, 320, 515, 225, 345, 471, 450 


Find out the total bonus paid to the employees. 
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22 From the following data construct a bivariate frequency distri. 
bution : 


Age of husbands Age of wives Age of husbands Age of wives 
(in зох Соо (in oar (in years) 
У. œ 


(Delhi U. B. Com. 1979) 


Ans. x : 25 26 27 28 y:19 20 21 22 
gate 46.5 3 | $533 416-2 


23 The following figures are income (x) and percentage expenditure 


on food (y) in 25 families. Construct a bivariate frequency table classifying х 
into intervals 200—300, 300—400,...and y into 10—15, 15—20. 


550 12 225 25,5. 16800. 13 202 29 680 1l 
623 14 310 26 300 25 255 215 5235.12 
310 — 18 640 20 225 16 492 13 317.45 
420 16 512 18 5155551 `15 587 21} e 
600 15 690 120000232509 023 643 19 400 


ыш ейт 
(Bombay U. B. Com. April 1981) 


3.5. Tabulation-Meaning and Importance. Ву tabulation 
we mean the systematic presentation of the information оосар, i 
in the data, in rows and columns in accordance with some $2 D 
features or characteristics. Rows are horizontal атгалдеы ши 
columns are vertical arrangements. In the words of A.M. Tuttle : 


“A statistical table is the logical listing of related але 
data in vertical columns and Rovtzontal rows of numbers with sufi puer 
explanatory and qualifying words, phrases and statements т the Јо ДУ 
of titles, headings and notes to make clear the full meaning of 
and their origin,” 


Professor Bowley in his manual of Statistics refers to ыр 
lation as “the intermediate process between the accumulation of Poen 
in whatever form they are obtained, and the final reasoned ассои 
the result shown by the statistics." 
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Tabulation is one of the most important and ingeneous device 
of presenting the data in a condensed and readily comprehensible 
form and attempts to furnish the maximum information contained 
in the data in the minimum possible space, without. sacrificing the 
quality and usefulness of the data. It is an intermediate process 
between the collection of the data on one hand and statistical analysis 
on the other hand, In fact, tabulation is the final stage in collection 
and compilation of the data and forms the gateway for further 
statistical analysis and interpretations. Tabulation makes the data 
comprehensible and facilitates comparisons (by classifying data into 
suitable groups) and the work of further statistical analysis, averag- 
ing, correlation etc. If makes the data suitable for further Digram- 
matic and Graphic representation. 


Ifthe information contained in the data is expressed as a 
running text using language paragraphs, it is quite time-consuming 
to comprehend it because in order to understand every minute 
details of the text, one has to go through all the paragraphs ; which 
usually contain very large amount of repetitions, Tabulation over- 
comes the drawback of the repetition of explanatory phrases and 
headings and presents. the data in a neat, readily comprehensible 
and true perspective, thus highlighting the significant and relevant 
details and information. Tabulated data have attractive get up 
and leave a lasting impression on the mind as compared to the data 
in the textual form. Tabulation also facilitates the detection of the 
errors and the omissions in the data. Tabulation enables us to 
draw the attention of the observer to specific items by means of 
comparisons, emphasis and arrangement of the layout. 


No hard and fast rules can be laid down for tabulating the 
statistical data. To prepare a first class table one must have a clear 
idea about the facts to be presented and stressed, the points on 
which emphasis is to be laid and familiarity with technique of pre- 
paration of the table. The arrangement of data tabulation requires 
considerable thought to ensure showing the relationship between the 
data of one or more series, as well as the significance of all the figures 
given in the classification adopted. The facts, comparisons and 
contrasts, and emphasis vary from one table to another table. 
Accordingly a good table (the requirements of which are given 
below) can only be obtained through the skill, expertise, experience 
and common sense of the tabulator, keeping in view the nature, 
scope and objectives of the enquiry. This bears testimony to the 
following words of A.L. Bowley : 

"In the tabulation of the data common sense із the chief requisite 
and experience is tho chief teacher.” 


3.5.1. Parts of a Table. The various parts of a table vary 
from problem to problem depending upon the nature of the data 
and the purpose of the investigation. However, the following are 
a must in a good statistical table : 
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(i) Table number 
(ii) Title 
(iii) Head notes or Prefatory notes 
(iv) Captions and Stubs 
(v) Body of the table 
(vi) Foot note. 
(vii) Source note. 

1. Table Number, Ifa book or an article or a report con- 
tains more than one table then all the tables should be numbered 
in a logical sequence for proper identification and easy and ready 
reference for future. The tablenumber may be placed at the top 
оке table either in the centre above the title or in the side of the 
title. 


2. Title. Every table must be given a suitable title, which 
usually appears at the top of the table (below the table number 
or next to the table number). A title is meant to describe in brief 
and concise form the contents of the table and should be self 
explanatory, It should precisely describe the nature of the data 
(criteria of classification, if any) ; the place (i.e. the geographical 
or political region or area to which the data relate) ; the time (t.e. 
period to which the data relate) and the source ofthedata The 
title should be brief but not an incomplete one and not at the cost 
of clarity, It should be un-ambiguous and properly worded and 
punctuated. Sometimes it becomes desirable to use long titles 
for the sake of clarity. In such a situation a catch title’ may dus 
given above the ‘main title’. Of all the parts of the table, title 
should be most prominently lettered. 


3. Head note (or Prefatory notes). If need be, head 
note is given just below the title in a prominent type usually centred 
and enclosed in brackets for further description ofthe contents eh 
the table. It is a sort of a supplement to the title and provides an 
explanation concerning the entire table or its major parts-like 
captions or stubs. For instance, the units of measurements ae 
usually expressed as head such as ‘in hectares’, ‘in millions’, 10 
quintals’, ‘in Rupees’, etc, 
headings or desi- 


4. Capti i th È 
aptions and Stubs. Captions are siis (E designee 


gnations for vertical columns and stubs are the hea - 
tion for the horizontal rows. They should be brief, concise fi 
self explanatory. Captions are usally written in the middle of t е 
columns їп small letters to economise space. Ifthe same unit k 
used for all the entires in the table then it may be given as a hea 
note along with the title. However, if the items in different colom 
or rows are measured or expressed in different units, then the соо 
ponding units should also be indicated in the columns үзг 
Relative units like ratios, percentage etc., if апу, should A EAT 
specified in the respective rows or columns, For instance, 


ee ee 
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columns may constitute the population (in millions) of different 
countries and rows may indicate the different periods (years). 


Quite often two or more columns or rows corresponding to 
similar classifications (or with same headings) may be grouped 
together under a common heading to avoid repetitions and may be 
given what are called sub-captions or sub-stubs. It is also desir- 
able to number each column and row for reference and to facilitate 
comparisons. 


5. Body ofthe Table. The arrangement of the data accor- 
ding to the descriptions given in the captions (columns) and stubs 
(rows) forms the body of the table. It contains the numerical in- 
formation which is to be presented to the readersand forms the 
most important part of the table. Undesirable and irrelevant (to 
the enquiry) information should be avoided. To increase the use- 
fulness of the table, totals must be given for each separate class/ 
category immediately below the columns or against the rows. In 
addition, е grand totals for all the classes for rows/columns 
should also be given. 


6. Foot Note. When some characteristic or feature or item 
of the table has not been adequately explained and needs further 
elaboration or when some additional or extra information is requir- 
ed for its complete description, foot notes are used for this purpose. 
As the name suggests, foot notes, if any, are placed at the bottom 
of the table directly below the body of the table. Foot notes may 
be attached to the title, captions, stubs or any part of the body of 
the table. Foot notes are identified by the symbols *, **, *#*, 


T, @ etc. 


7. Source Note. Ifthe source of the table is not explicitly 
contained in the title, it must be given at the bottom of the table, 
below the foot note, if any. The source note is required if the 
secondary data are used. If the data are taken from a research 
journal or periodical, then thesource note should give the name of 
the journal or periodical along with the data of publication, its 
volume number, table number (if any), page number etc., so that 
anybody who uses this data may satisfy himself, (if need be), about 
the accuracy of the figures given in the table by referring to the 
original source, Source note will also enable the user to decide about 
the reliability of the data since to the learned users of Statistics 
the reputations of the sources may vary greatly from one agency 
to another. 


The format of a blank table is given on page 102: 


Remarks 1. А table should be so designed that it is neither 
too long and narrow nor too short and broad. It should be of 
reasonable size adjusted to the space at our disposal and should 
have an attractive get up. If the data are very large they should not 
be crowded in a single table which would become unwieldy and 
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difficult to comprehend. In such a situation it is desirable to split 
the large table into a number of tables of reasonable size and 
shape. Each table should be complete in itself. 


2. If the figures corresponding to certain items in the table 
are not available due to certain reasons, then the gaps arising there- 
from should be filled by writing N.A. which is used аз an abbrevia- 
tion for ‘not available’. 


3.5.2. Requisties of a Good Table. As pointed out earlier 
no hard and fast rules can be laid down for preparing a statistical 
table. Preparation of a good statistical table is a specialised job 
and requires great skill, experience and common sense on the part 
* ofthetabulator. However, commensurate with the objectives and 
scope of the enquiry, the following points may be borne in mind 
while preparing a good statistical table, 


TABLE 3.10 


FORMAT OF A BLANK TABLE 
TITLE 


[Head Note or Prefatory Note (if any)] 


Caption 


Foot Note: — 
Source Note : 


ee 


Classification and Tabulation 103 


(i) The table should be simple and compact so that it is 
readily comprehensible. It should be free from all sorts of over- 
lappings and ambiguities. 

(is) The classification in the table should be so arranged as 
to focus attention on the main comparisons and exhibit the rela- 
tionship between various related items and facilitate statistical 
analysis. It should highlight the relevant and desired information 
needed for further statistical investigation and emphasize the im- 
portant points їп а compact and concise way. Different modes of 
lettering (in italics, bold or antique type, capital letters or small 
letters of the alphabet etc.,) may be used to distinguish points of 
special emphasis. 

(iii) A table should be complete and self explanatory. It | 
should have a suitable title, head note (if necessary), captions and 
stubs and foot note (if necessary), If the data are secondary, the 
source note should also be given, [For details see § 3"5`1]. The 
use of dash (—) and ditto marks (,,) should be avoided. Only 
accepted common.abbreviations should be used. 

(iv) A table should have an attractive get up which is appeal- 
ing to the eye and the mind so that the reader may grasp it with- 
out any strain. This necessitates special attention to the size of the 
table and proper spacings of rows and columns. 


(v) Since a statistical table forms the basis for statistical 
analysis and computation of various statistical measures like ave- 
rages, dispersion, skewness etc., it should be accurate and free 
from all sorts of errors. This necessitates checking and re-checking 
of the entries in the table at each stage because even a minor error 
of tabulation may lead to very fallacious conclusions and mislead- 
ing interpretations of the results. 

(vi) The classification of the data in the table shoud be in 
alphabetical, geographical or chronological order orin order of 
magnitude or importance to facilitate comparisons. 

(vii) A summary table [See $ 3:5:3] should have adequate 
interpretative figures like totals, ratios, percentages, averages etc. 


3.5.3. Types of Tabulation. Statistical tables are cons- 
tructed in many ways. Their choice basically depends upon : 


(i) Objectives and scope of the enquiry. 
(it) Nature of the enquiry (primary or secondary) 
(iii) Extent of coverage given in the enquiry. 


The following diagrammatic scheme elegantly displays the 
various forms of tables commonly used in practice. 
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Types of Tables 


On the basis of On the basis of On the basis of 
objectives or nature of coverage 
Purpose enquiry 
General Special Original or Derived or Simple Complex 
urpose or purpose or Primary Derivative Table Table 
eference Summary Table Table 
Table Table 


General Purpose (or Reference) and Special Purpose 
(or Summary) Tables. General purpose tables, which are also 
known as reference tables or. sometimes informative tables provide a 
convenient way of compiling and presenting a systematically 
arranged data, usually in chronological order, in a form which is 
suitable for ready reference and record without any intentions of 
comparative studies, relationship or significance of figures. Most 
of the tables prepared by government agencies c.g. the detailed 
tables in the census reports, are of this kind. These tables are of 
repository nature and mainly designed for use by research workers, 
Statisticians and are generally given at the end of the report in the 
form of an appendix. Examples of such tables are: age and sex- 
wise distribution of the population ofa particular region, commu- 
ert or country ; pay rolls of a business house ; sales orders for 
different products manufactured by a concern ; the distribution of 
Students in a university according to age, sex and the faculty they 
join ; and so on. 


As distinct from the general purpose or reference tables, the 
special Purpose or summary tables (also sometimes called interpre- 
tative tables) are of analytical nature-and are prepared with the 
idea of making comparative studies and studying the relationship 
and the significance of the figures provided by the data. These are 
generally constructed to emphasise some facts or relationships 
pertaining to a particular or specific purpose. In such tables inter- 
pretative figures like ratios, percentages etc., are used in order to 
facilitate comparisons. Summary tables are sometimes called 
derived or derivative tables (discussed below) as they are generally 
derived from the general purpose tables. 


Original and D erived Tables. On the basis of the nature 
or originality of the data, {Һе tables may be classified into two 
classes : 

(i) Primary tables (ii) Derived or Derivative tables. : 

Ina primary table, the statistical facts are expressed in the 
original form, It, therefore, contains absolute and actual iria 
and not rounded numbers or percentages, On the other nane 
derived or derivative table is one which contains figures and тр is 
derived from the orginal or primary data. It expresses the ROT 
mation in terms of ratios, percentages, aggregates or statis d 4 
measureslike average, dispersion, skewness etc. For instance, 
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time series data is expressed in a primary table but a table exp. ess- 
ing the trend values and seasonal and cyclic variations is a derived 
table. In practice, mixtures of primary and derived tables are 
generally used, an illustration being given below : 
TABLE 3.11 
LOAD CARRIED BY RAILWAYS AND ROAD TRANSPORT 
FOR DIFFERENT YEARS (In Billion Tonne Km). 


Percentage Share 


Railways |Road Transport 


Railways Road Transport 


1975-76 


Simple and Complex Tables. Ina simple table the data 
are classified w,r.t. a single characteristic and accordingly it is also 
termed as one-way table. On the other hand if the data are grouped 
into different classes w.r.t. two ог more characteristics or criteria 
simultaneously, then we get a complex or mainfold table, In parti- 
cular, if the data are classified w.r.£. two (three) characteristics 
simultaneously we get a two-way (three-way) table, 


Simple Table. As already stated, a simple table furnishes 
information about only one single characteristic of the data. For 
instance Table 3.1 (on page 63) relating to agricultural output of 
different countries (in kg per hectare); Table 3.2 (on page 63) 
giving the density of population (per square kilometre) in different 
cities of India ; Table 3.3 (on page 64) giving the population of 
India (in crores) for different years, are all simple tables. As another 
illustration, the following table giving the imports from principal 
countries (by sea, air and land) for the year 1975-76 is a simple 


table: TABLE 3.12 


IMPORTS FROM PRINCIPAL COUNTRIES BY SEA, 
AIR AND LAND FOR 1975-76. 


(Rupees in lakhs) 


Country Imports 


Australia 

Canada 

France 

(west) Germany 
n 
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Two-way Table. However, if the caption or stub is classified 
into two sub-groups, which means that the data are classified w.r.t. 
two characteristics we get a two-way table. Thus a two-way table 
furnishes information about two inter-related characteristics of a 
particular phenomenon, For example the distribution of the 
number of students in a college w.r.t, age (lst characteristic) and 
sex (2nd characteristic) gives a two-way table. As another illus- 
tration, Table 3.13 which gives the load/distance by Railways and 
Road Transport for different years, is a two-way table. 


TABLE 3.13 


LOAD CARRIED BY RAILWAYS AND ROAD TRANSPORT 
FOR DIFFERENT YEARS 


(In. Billion Tonne Km.) 


1975-76 


Three-way Tables. If the data are classified simultaneously 
w.r.t. three characteristics, we get a three-way table. Thus a three- 
way table gives us information regarding three inter-related charac- 
teristics of a particular phenomenon. For example the classification 
of a given population w.r.t. age, sex andliteracy, or the classification 
of the students in a university w.r.t. sex, faculty (Arts, Sciences, 
Commerce) and the class (Ist year, 2nd year, 3rd year of the under- 
‘graduate courses) will give rise to three-way tables. The tables 
given in examples 3°13 to 3:16 and 3.17 to 3.19 are three-way tables. 
As another illustration, the followirg table representing the distri- 
bution of population of a city according to different age-groups (вау, 
five age groups from 0 to 100 years), sex and literacy is a three- 
way table. 


ae ҮҮ АНТЕ 


yn Miah Г АА М TIU 
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DISTRIBUTION OF POPULATION (IN *000) OF A CITY 


Literates Illiterates Totals 


Column 
Totals 


Higher Order or Manifold Tables. These tables give the 
information on a large number of inter-related problems or characs 
teristics of a given phenomenon. For example, the distribution of 
students in a college according to faculty, class, sex and year 
(Example 324) or the distribution of employees іп a business con- 
cern according to sex. age-groups, years and grades of salary 
Question 20) gives rise to manifold tables. Manifold or higher 
order tables are commonly used in presenting population census 
data. 

Remark. It may be pointed out that as the order of the table 
goes on increasing, the table becomes more and more difficult. to 
comprehend and might even become confusing. In practice, ina 
single table only upto three or sometimes four characteristics are re- 
presented simultaneously, If the study is confined to more than four 
characteristics at а tims then it is desirable to represent the data in 
more than one table for depicting the relationship between different 


characteristics. 


Example 3.13. Present the following information in a. suitable 
tabular form, supplying the figures not directly given : 

In 1965 out of total 2000 workers in a factory, 1850 were 
members of a trade union. The number of women workers employed 
was 250, owt of which 200 did not belong to any trade union. 


In 1970, the number of union workers was 1725 of which 1600 
wore men. The number of non-union workers was 380, among which 155 
were women. [L.C.W.A. (Intermediate) June, 1975] 


108 Business Statistics 


Solution. 


COMPARATIVE STUDY OF THE MEMBERSHIP OF 
TRADE UNION IN 


A FACTORY IN 1965 AND 1970 


Example 2.14. Ina sample study about coffee habit in two 
towns, the following information was received : 


Town A Females were 40% ; Total coffee drinkers were 45% 
and Males non-coffze drinkers were 20%. 


Town B Males were 5595; Males non-coffee drinkers were 
30% ; and Females coffee drinkers were 15%. 


Present the above data ina tabular form. 


[Delhi Univ. B, Com., (Hons.) 1978] 
Solution. 


PERCENTAGE OF COFFEE DRINKERS IN TOWNS A & B 


Males Females 
ЕНЕ 


У bs Ж : in the data. 
Remark. Figures in Italics types are not given in t 

These have been obtained from the totals and the given figures by 

minor calculations (subtractions/additons). 
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Example 3'15. Tabulate the following :— 


у Ош of a {о al number of 10,000 candidates who applied for jobs 
in a government department, 6 854 were males, 3,146 were graduates 
and others, non-graduates. The number of candidates with some 
experience was 2 623 of whom 1,860 were males. The number of male 
garduates was 2012 The number of graduates with experience was 
1,093 that includes 328 females. (Bombay U. B. Com. April 1983) 


Solution. 


DISTRIBUTION OF CANDIDATES FOR. GOVERNMENT 
JOBS—SEX-WISE, EDUCATION-WISE AND 
EXPERIENCE-WISE 


Non-graduates 


Graduates 


Sex 
V E МЕ Total | E МЕ Total NE Total 


Male TIO 1242 2012 1090 3752 4842 4994 6854 
Female 323 811 1134 440 1572 2012 763 23:3 3146 


TOTAL 1093 2053 3146 | 1530 5324 6854 | 2623 7377 10000 


E—Experienced ; NE—Non-Experienced. 


Example 3:16. А survey of 370 students from Commerce 
Faculty and 130 students from Science Faculty revealed that 180 
students were studying for only C. A. Rxaminations, 149 for only Cost- 
ing Examinations and 80 for both C,A. and Costing Haaminations, 


The rest had offered part-time Management Courses. Of those studying 
for Ccsting only, 13 were girls and 90 boys belonged to Commerce 
Faculty, Out of 50 studying for both С.А. and Oosiing, 72 were from 
Commerce Faculty amongst which 70 were boys, Amongat those who 
offered part-time Management Courses, 50 boys were from Science 
Facu'ly und. 30 boys and 10 girls from Commerce Faculty, In all there 
were 110 boys in Science Faculty, 

Present the above information in a tabular form. Find the 
number of studente from Science Faculty studying for part-time 
Management Courses. [С.А. (Intermediate) Dec. 1979] 
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FACULTY AND COURSE-WISE DISTRIBUTION 
ON STUDENTS 


Number of students from Science Faculty studying part time 
management courses is 60, Figures in italics are not given in 
the data and have been obtained after doing some calculations 


{ (additions/subtractions), 


stereo c iiem textile and non-textile areas was 154 and 16 
[Delhi U. B. Com, (Hons.) 1972] 


Solution. Total number of women interviewed=1,807 

No. of women from textile areas-512 

«>» No. of women from non-textile areas—1,807—512—1,295 

Total No. of married women in textile areas=247+73=320 

Total No. of married women in non-textile areas 
=49+520=569 

Total No. of inexperienced women 1,341 
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<. Total No. of experienced women 
=1,807—1,341=466 
Total No. of unmarried women=918 
„7. Total No. of married women=1,807—918=889 
Total No. of unmarried pens women in textile areas 


and total No. of unmarried experienced women in non-textile areas 


_ After filling this information in the table, the remaining 
entries in the table of the experience, marital status and area wise 
distribution of the number of women can now be completed by 
subtraction/addition, wherever necessary. 

TABLE SHOWING THE NUMBER OF WOMEN INTER- 
VIEWED FOR EMPLOYMENT IN A TEXTILE FAC- 
TORY ACCORDING TO THEIR MARITAL STATUS, 
EXPERIENCE AND AREA THEY BELONG 


Textile Areas 


Exp. Total 


Note : Exp. indicates Experienced : In-exp. indicates In-experienced, 


Example 3°18. Draw up a blank table to show the number of 
candidates sexwise, appearing in the Pre-wniversity, First year, Second 
year and Third year examinations of a university in the faculties 


of Art, Science and Commerce in a certain year. 
[I.C.W.A. (Intermediate) Dec. 1974] 


/ 
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Solution, 


DISTRIBUTION OF CANDIDATES APPEARING IN THE 
UNIVERSITY EXAMINATIONS / 


Totals 


Pre-university 


First year 
Second vear 
Third year 


Column Totals 


Note. M indicates Male ; F indicates Female ; ST indicates Sub-Total. 


Example 319. Prepare a blank table to show the exports of 
three companies A, B, С to five countries U.K , U.S.A., U.S.S.R., 
France and West Germany, in each of the ya rs 1970 10 1974. 

[L.O.W A. (Intermediate) Dec, 1975] 


Solution. 


EXPORTS OF THE COMPANIES A, B AND C TO FIVE 
COUNTRIES FROM 1970 TO 1974 (IN MILLION RUPEES) 


| Prance 


West Germany 
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Example 3.20. Draw а blank table to present the following 
information regarding the college students according to : 
(a) Faculty : Social Sciences, Commercial Sciences. 
(b) Class : Under-graduate and Post-graduate classes. 
(c) Sea: Male and Female. 


(d) Years: 1970 to 1974, 
[Andhra Pradesh Uni. B. Com. 1976} . 


Solution. Please see on page 114. 
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EXERCISE 3.2 


. _1.@ Explain the terms ‘classification’ and ‘tabulation’ and point out 
their importance in a statistical investigation. What precautions would you 


take in tabulating statistical data ? 
U.C.W.A. Final Jan. 1979 (О.$.)] 


(b) Whatare the chief functions of tabulation ? What precautions 


would you take in tabulating statistical data ? 
[Osmania U. B.Com, (Hons.) Nov. 1981] 


(c) What are different parts of a statistical table ? Give an example to 
illustrate. [Delhi U. B.Com. (Hons.) 1980] 


2.(a) Discuss the importance of classification and tabulation in statis- 
tical analysis. (Bombay U. B.Com, May 1978) 


(b) Explain the purpose of classification and tabulation of data. State 


the rules that serve as a guide in tabulation of data. 
(Bombay U. B.Com. May 1979) 


3.(a) What do you mean by tabulation of data? What precautions 
would you take while tabulating data ? 
[Punjabi U. M.A. (Econ.) 1982] 


(b) Distinguish between classification and tabulation of statistical data. 
Mention the requisites of a good statistical table. 
(Mysore U. B.Com. Oct. 1981) 


(c) Distinguish between classification and tabulation. What precautions 
would you take in tabulating data ? 
(Nagarjuna U. B.Com. April 1981) 


4. Comment on the statement “In collection and tabulation of data 
common sense is the chief requisite and experience the chief teacher". 

5. “The statistical table is a systematic arrangement of numerical data 
presented in columns and rows for purposes of comparison.” Explain and 
discuss the various types of tables used іп statistical investigation after the data 
have been collected. 

6. Ina trip organised by a college there were 80 perso г 
paid Rs. 15:50 оп an average. There were 60 students each of whom paid 
Rs. 16. Members of the teaching staff were charged at a higher rate, The 


number of servants was 5 (all males) and they were not charged anything. The 
number of ladies was 24% of the total of which one was a lady staff member, 


Tabulate the above information. 
(Bombay U. B.Com. 1976) 


ns each of whom 


7. Tabulate the following data : ў 

А survey was conducted amongst one lakh spectators visting on a parti- 
cular day cinema houses showing criminal, social, historical, comic and mytho- 
logical films. The proportion of male to female spectators under survey was 
three to two. It indicated that while the respective percentages of spectators 
seeing criminal, social and historical films was sixteen, twenty-six and eighteen, 
the actual number of female viewers seeing these types was four thousand six 
hundred, twelve thousand two hundred, and seven thousand eight hundred 
respectively, The remaining two types of films, namely comic and mythological, 
were seen by forty per cent and one per cent of the male spectators, The 
number of female spectators seeing mythological filins was four thousand four 


hundred. 
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8. Present the following information in a suitable tabular form. 


“In 1965, out of a total of 1750 workers of a factory, 1200 were members 
of a trade union. The number of women employees was 200 of which 175 did 
wot belong to a trade union. In 1970, the number of union workers increased 
to 1580 of which 1290 were men. On the other hand, the number of non- 
union workers fell to 208 of which 180 were men. 


In 1975, there were 1800 employees who belonged to a trade union and 

50 who did not belong to a trade union. Of all the employees in 1975, 300 
were women, of whom only 8 did not belong to a trade union" 

(Mysore U. B.Com. Nov. 1980) 


9. Present the following information in a tabular form : 


In 1975, out of a total of 4,000 workers in a factory, 3,300 were members 
ofa trade union. The number of women workers employed was 500 out of 
which 400 did not belong to the union. In 1974, the number of workers in the 
union was 3,450 of which 3,200 were men, The number of non-union workers 
was 760 of which 330 were women. 

(Bombay U. B.Com. Nov. 1980) 


10. A classification of the Population of India by livelihood categories 
(agricultural and non-agricultural) according to the 1951 census showed that out 
of total of 356,628 thousand persons, 249.075 thousand Persons belonged to 
agricultural category. In the agricuitural category 71,049 thousand persons were 
self-supporting, 31,069 thousand were earning dependants and the rest were non- 
earning. The number of non-earning persons and self-supporting persons in the 
non-agricultural category were 67,335 thousand and 33,350 thousand respec- 
tively. The others were earning dependents. 


Tabulate the above information expressing all figures in millions 
(1 million 1,000 thousand). (Bombay U. B.Com. 1976) 


11. A survey was conducted among 1,00,000 music listeners who were 
asked to indicate their preference for classical, light music, folk songs, film 
songs and pop varieties of music. ‘fhe male listeners interviewed were as many 
as female listeners. The survey indicated that wnile the percentage of listeners 
who preferred classical, light classical and folk songs were cight, thirteen and 
four respectively, the actual number of females for each of the first two kinds 
were six thousand. Of the listeners who liked folk songs, the number of male 
listeners was same as that of female listeners. While film songs were liked by 
number one and half times that for all other varieties put together, the number 
for pop music were only a fourth of the number ot film song listeners. Sixty per 
cent of the listeners of pop music were females, 


Prepare a table showing the distribution of music listeners according to 
sex and type of music, 


12. What are different parts of a table? What points should be borne 


in mind while arranging the items in a table 2 


An investigation conducted by the education department in a public 
library revealed the following facts. You are required to tabulate the informa- 
tion as neatly and clearly as you can: 


“In 1950 the total number of readers was 46,000 and they borrowed some 
16,000 volumes. In 1960 the number of books borrowed increased by 4,000 and 
the borrowers by 5%. 


The classification was on the basis of three sections : literature, fiction 
and illustrated news. There were 10,000 and 30,000 readers in the sections 
literature and fiction respectively in the year 1950, In the same year 2,000 and 
10,000 books were lent in the sections illustrated news and fiction respectively. 
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Marked changes were seen in 1960. Th 

М У 1 . There were 7,000 and 42,00 i 
Шеше and fiction sections respectively. So also 4,000 and 13 а) ы 
ent in the sections illustrated news and fiction respectively." x pee 


13. Criticise the following table : 


Castings Weight of Metal Foundry Hours 
up to 5 kgs 70 220 
up to 10 kgs | 110 650 
All higher weights 120 810 
Others 30 70 
Total 330 2,010 


[Madurai U. B. Com. Oct. 1978] 


14. (a) What are the components of a good table. 

(b) Construct a blank table in which could beshown, at two different 
dates and in five industries, the average wages of the four groups, males and 
females, eighteen years and over and under eighteen years. Suggest a suitable 
title, [Delhi U. B.A. Eco. (Hons.) 1972] 


15. State briefly the requirements of a good statistical table, 


Prepare a blank table to show the distribution of population of various 
States and Union Territories of India according to sex and literacy. 
[J.C.W.A. (Intermediate) June 1976] 
16. Drafta blank table to show the distribution of personnel working in 
500— 


an office according to (i) sex, (ii) three grades of salary—below Rs. 500, 
1000, above 1000, (iii) age groups below 25 years, 25—40 and 40—60 (iv) 3 years 


1975-76, 1976-77, 1977-78. 
17. Draw up a blank table to show five categories of skilled and unskil- 
sual, clerical and supervisory further 


led workers, i.e., regular, seasonal, ca s 
divided into family members and paid workers with monthly/daily rate and 
(Bombay U. B. Com. 1971] 


piece rate. 

18. Draw up in detail, with proper attention to spacing. double lines, 
etc.. and showing all sub-totals, a blank table in which could be entered the 
numbers occupied in six industries on two dates, distinguishing males from 
females, and among the latter single, married and widowed, 


19, Draft a blank table to show the following information for the 
United Kingdom to cover the years 1914, 1939, 1949 and 1956, 

(a) Population 

(b) Income-tax collected 

(c) Tobacco duties collected 

(d) Spirits and beer duties collected 

(e) Other taxation. 

Arrange for suitable columns to show also the “per capita" figures for 


(b), (c), (d), (е). Suggest a suitable title. 
[Institute of Company Accountant s] 
20. Prepare a blank table showing the number of 

employees in a big business concern according to : 

(a) Sex: Males and Females. 

(b) Five age-groups : Below 25 years, 25 to 35 years, 35 to 45 
years, 45 to 55 years, 55 years and over. 

(c) Two years : 1983 and 1984. 

(d) Three aradeg of salaru : Below Вв. 400 : Rs. 400 to 700; 


4 


Diagrammatic and Graphic 
Representation 


4.1. Introduction. In Chapter 3, we discussed that classifica- 
tion and tabulation are the devices of presenting the statistical data 
in neat, concise, systematic and readily comprehensible and intelligi- 
ble form, thus highlighting the salient features. Another important, 
convincing, appealing and easily understood method of presenting 
the statistical data is the use of diagrams and graphs. They are no- 
thing but geometrical figures like points, lines, bars, squares rect-- 
angles, circles, cubes etc., pictures, maps or charts. 


Diagrammatic and graphic presentation has a number of ad- 
vantages some of which are enumerated below : 


(i) Diagrams and graphs are visual aids which give a bird's 
eye view of a given set of numerical data. They present the data in 
simple, readily comprehensible form. 


(ii) Diagrams are generally more attractive, fascinating. and 
impressive than the set of numerical data. They are more appeal- 
ing to the eye and leave a much lasting impression on the mind as 
compared to the dry and un-interesting statistical figures. Even a 
layman, who has no statistical background can understand them 
easily, 

(iti) They are more catching and as such are extensively used 
to persent statistical figures and facts in most of the exhibitions, 
trade or industrial fairs, public functions, statistical reports etc. 
Human mind has a natural craving and love for beautiful pictures 
and this psychology of the human mind is extensively exploited by 
the modern advertising agencies who give their advertisements in 
the shape of attractive and beautiful pictures, Accordingly diagrams 
and graphs have universal applicability. 


(iv) They register a meaningful impression on the mind almost 
before we think, They also save lot of time as very little effort is 
required to grasp them and draw meaningful inferences from them. 
An individual may not like to go through a set of numerical figures 
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but he may pause for a while to have a glance at the diagrams or 
Pictures. It is for this reason that diagrams, graphs and charts find 
a place almost daily in financial/business colums of the news- 
papers, economic and business journals, annual reports of the busi- 
ness houses etc. 

(v) When properly constructed, diagrams and graphs readily 
show information that might otherwise be lost amid the details of 
numerical tabulations. They highlight the salient features of the 
collected data, facilitate comparisons among two or more sets of 
data and enable us to study the relationship between them more 
readily. 

(vi) Graphs reveal the trends, if any present in the data more 
vividly than the tabulated numerical figures and also exhibit the 
way in which the trends change. Although this information is inhe- 
rent in a table, it may be quite difficult and time consuning (and 
sometimes may be impossible) to determine the existence and nature 
of trends from a tabulation of data. 


42. Difference between Diagrams and Graphs. No hard 
and fast rules exist to distingush between diagrams and graphs but 
the following points of difference may be observed : 

(i) In the construction of a graph, generally graph paper is 
used which helps us to study the mathematical relationship (though 
not necessarily functional) between the two variables. On the other 
hand, diagrams are generally constructed on a plane paper and are 
used for comparisons only and not for studying the relationship bet- 
ween the variables. In diagrams data are presented by devices such 
as bars, rectangles, squares, circles, cubes etc., while in graphic 
mode of presentation points or lines of different kinds (dots, das- 
hes, dot-dash etc.), are used to present the data. 

(i?) Diagrams furnish only approximate information. They do 
not add anything to the meaning of the data and, therefore, are not 
of much use to a statistician or research worker for further mathe- 
matical treatment or statistical analysis. On the other hand, graphs 
are more obvious, precise, and accurate than the diagrams and are 
quite helpful to the statistician for the study of slopes, rates of 
change and estimation, (interpolation and extrapolation), wherever 
Possible. In fact, today, graphic work is almost a must in any 
research work pertaining to the analysis of economic, business or 
social data. 

(iii) Diagrams are useful in depicting categorical and geogra- 
phical data but they fail to present data relating to time series and 
frequency distributions. In fact, graphs are used for the study of 
time series and frequency distributions. 

(iv) Construction of graphs is easier as compared to the cons- 
truction of diagrams. 

In the following sections we shall first discuss the various 
types of diagrams and then the different modes of graphic presen- 


tation. 
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4.3. DIAGRAMMATIC PRESENTATION 


431. General Rules for Constructing Diagrams 

1. Neatness. As already pointed out, diagrams are visual 
aids for presentations of statistical data and are more appealing 
and fascinating to the eye and leave a lasting impression on the 
mind, It is, therefore, imperative that they are made very neat, 
clean and attractive by proper size and lettering ; and the use of 
appropriate devices like different colours, different shades (light and 
dark), dots, dashes, dotted lines, broken lines, dots and dash lines 
etc., for filling the in between space of the bars, rectangles, circles 
etc., and their components. Some of the commonly used devices are 
given belows: 


Fig. 4.1 


2. Title and Footnotes. As in the case of a good statistical 
table, each diagram should be given a suitable title to indicate the 
subject matter and the various facts depicted in the diagram. The 
title should be brief, self explanatory, clear and non-ambiguous. 
However, brevity should not be attempted at the cost of clarity, 
The title should be neatly displayed either at the top of the diagram 
or at its bottom, 

If necessary the footnotes may be given at the left hand 
bottom of the diagram to explain certain points or facts, not others 
wise covered in the title, 

3. Selection of Scale. One of the most important factors in 
the construction of diagrams is the choice of an appropriate scale, 
The same set of numerical data if plotted on different scales may 
give the diagrams differing widely in size and at times might lead 
to wrong and misleading interpretations. Hence, the scale should be 
selected with great caution. Unfortunately, no hard and fast rules 
are laid down for the choice of scale, As a guiding principle the 
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scale should be selected consistent with the size of the paper and the 
size of the observations to be displayed so that the diagram obtained 
is neither too small nor too big. The size of the diagram should be 
reasonable so as to focus attention on the salient features and im- 
portant characteristics of the data. The scale showing the values 
should be in even numbers or multiples of 5 or 10, The scale(s) 
used on both the horizontal and vertical axes should be clearly 
indicated. For comparative study of two or more diagrams, the 
same scale should be adopted to draw valid conclusions. 


4. Proportion Between Width and Height. А proper 
proportion between the dimensions (height and width) of the dia- 
gram should be maintained, consistent with the space available 
Here again no hard and fast rules are laid down. In this regard 
Lutz in his book ‘Graphic Presentation’ has suggested a rule called 
“root two’ rule, viz., the ratio | to 4/2 or 1to1.414 between the smaller 
side and the larger side respectively. The diagram should be gene- 
rally displayed in the middle (centre) of the page. 


5. Choice of a Diagram. A large number of diagrams 
(discussed below) are used to present statistical data. The choice of 
a particular diagram to present a given set of numerical data is 
notan easy one. It primarily depends on the nature of the data, 
magnitude of the observations and the type of people for whom the 
diagrams are meant and requires great amount of expertise, skill 
and intelligence. An inappropriate choice of the diagram for the 
given set of data might give a distorted picture of the phenomenon 
under study and might lead to wrong and fallacious interpretations 
and conclusions. Hence, the choice of a diagram to present the 
given data should be made with utmost caution and care, 


6. Source-Note and Number. As in the case of tables, source 
note, wherever possible should be appendedat the bottom of the 
diagram. This is necessary as, to the learned audience of Statistics, 
the reliability of the information varies from source to source. Each 
diagram should also be given a number for ready reference and 
comparative study. 

7. Index. A brief index explaining various types of shades, 
colours, lines and designs used in the construction of the diagram 
should be given for clear understanding of the diagram. 


8. Simplicity. Lastly, diagrams should be as simple as pos- 
sible so that they are easily understood even by a lay-man who does 
not have any mathematical or statistical background. If too much 
information is presented in a single complex diagram it will be diffi- 
cult to grasp and might even become confusing to the mind. Hence, 
it is advisable to draw more simple diagrams than one or two comp- 


lex diagrams. А 
4:32. Types of Diagrams. A large variety of diagrammatic devi 
ces are used in practice to present statistical data. However, we 
shall discuss here only some of the most commonly used diagrams 
which may be broadly classified as follows : 
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(1) One-dimensional diagrams viz., line diagrams and bar 
diagrams. 

(2) Two-dimensional diagrams such as rectangles, squares 
and circles or pie diagrams. 

(3) Three-dimensional diagrams such as cubes, spheres, prisms 
cylinders and blocks. 

(4) Pictograms. 

(5) Gartogram. 
433. One dimensional diagrams 

A. LINE DIAGRAM 

This is the simplest of all the diagrams, It consists in drawing 
vertical lines, each vertical line being equal to the frequency. The 
variate (x) values are presented on a suitable scale along the X-axis 
and the corresponding frequencies are presented on a suitable scale 
along Y-axis. Line diagrams facilitate comparisons through they are 
Not attractive or appealing to the eye. 

. Remark. Even a time series data may be presented by a line 
diagram, by taking time factors along X-axis and the variate values 
along Y-axis, 

Example 41. The following data shows the number of acci- 
dents sustained by 314 drivers of a public utility company over a 
period of five years, 

Number of accidents : 

OBST е 82) БОТ FP 33 оу 10s 11 

Number of drivers : 

op ЛА OS NAL OG CO Ia Би Алл 2 
Represent the data by a line diagram, 
Solution, LINE DIAGRAM 


NUMBER OF DRIVERS 


оаза aa 5х бє 77.8: 9519 7l 


NUMBER OF ACCIDENTS. 


Fig. 4.2 


seein 
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B. BAR DIAGRAM 


Bar diagrams are one of the easiest and the most commonly 
used devices of presenting most cf the business and economic data, 
These ace especially satisfactory for categorical data or series, They 
consist of a group of equidistant rectangles, one for each group or 
category ofthe data in which thé values or the magnitudes are re- 
presented by the length or height of tie rectangles, the width of 
the rectangles being arbitrary and immaterial. These diagrams are 
called one dimensional because in such diagrams only one dimension 
viz., height (or length) of the rectangles is taken into account to 
Present the given values, The following points may be borne in 
mind to draw bar diagrams. 

(2) All the bars drawn in a single study should be of uniform 
(though arbitrary) width depending onthe number of bars to be 
drawn and the space available, 

(ii) Proper but uniform spacing should be given between 
different bars to make the diagram look more attractive and 
elegant. 

(iii) The height (length) of the rectangles or bars are taken 
proportional to magnitude of the observations, the scale being 
selected keeping in view the magnitude of the largest observation, 

(iv) All the bars should be constructed on the same base 
line. 

(v) It is desirable to write the figures (magnitudes) represent- 
ed by the bars at the top of the bars to enable the reader to have a 
precise idea of the value without looking at the scale. 

(vi) Bars may be drawn vertically or horizontally. However, 
in practice, vertical bars are generally used because they give an 
attractive and appealing get up. 

(vii) Wherever possible the bars should be arranged from left 
to right (from top to bottom in case of horizontal bars) in order of 
Magnitude to give a pleasing effect. 

Types of Bar Diagram. The following are the various types 
of bar diagram in common use : 

(i) Simple bar diagram. 

(ii) Sub-divided or component bar diagram. 

(isi) Percentage bar diagram. 

(vi) Multiple bar diagram. 

(v) Deviation or Bilateral bar diagram. 
A. SIMPLE BAR DIAGRAM 


Simple bar diagram is the simplest of the bar diagrams and is 
used frequently in practice for the comparative study of two or 
more items or values of a single variable or a single classification or 
category of data. For example, the data relating to sales, profits, 
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production, population etc., for different periods may be presented 
by bar diagrams. As already pointed out the magnitudes of the 
observations are represented by the heights of the rectangles. 


Remark. If there are a large number of items or values of 
the variable under study, then instead of bar diagram, line diagram 
may be drawn, 


Example 42. The following data relating to the strength of 
the Indian Merchant Shipping Fleet gives the Gross Registered Ton- 
nage (GRT) for different years. 


Year GET in '000 as оп 31st December 
1961 901 
1966 1,792 
1971 2,500 
1975 4 464 
1976 5,115 


Source : Ministry of Shipping and Transport, 
Represent the data by suitable bar diagram. 
Solution, 


STRENGTH OF INDIAN MERCHANT SHIPPING FLEET 
GROSS REGISTERED TONNAGE IN 2000 


1966 


Fig. 4.3 
С: SUB-DIVIDED OR COMPONENT BAR DIAGRAM 


A very serious limitation of the bar diagram is that it studies 
only one characteristic or classification ata time, For example, 
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the total number of students in a college for the last 5 years can be 
conveniently expressed by simple bar diagrams but it cannot be 
used if we have also to depict the faculty wise or sex-wise distri- 
bution of students. In such a situation, sub-divided or component 
bar diagram is used. Sub-divided bar-diagrams are useful not only 
for presenting several items of a variable or a category graphically 
but also enable us to make comparative study of different parts or 
Components among themselves and also to study the relationship 
between each component and the whole, 5 

In general sub-divided ог component bar diagrams аге to be 
used if the total magnitude of the given variable is to be divided 
into various parts or sub-classes or components, First of all a bar 
representing the totalis drawn. Then itis divided into various 
Segments, each segment representing a given component of the 
total. Different shades or colours, crossing or dotting, or designs 
are used to distinguish the various components and a key or index 
is given along with the diagram to explain these differences. 


In addition to the general rules for constructing bar diagrams, 
the following points may be kept in mind while constructing sub- 
divided or component bar diagrams : 


(i) To facilitate comparisons the order of the various com- 
ponents in different bars should be same. It is customary to show 
the largest component at the base of the bar and tne smallest com- 
ponent at the top so that the various components appear in the 
order of their magnitude. 

(ii) As already pointed, an index or key showing the various 
components represented by different shades, dottings, colours, etc., 
should be given. 

(iii) The use of sub-divided bar diagram is not suggested if 
the number of components eeds 10, because in that case the 
diagram is loaded with too much information and is not easy to 
understand and interpret. Pie or circle diagram (discussed later) is 
appropriate in such a situation. The comparison of the various 
components in different bars is quite tedious as they do not have 
a common base and requires great skill and expertise. 


Example 4:3. Represent the following data by a suitable 
diagram : Р 


Items of Expenditure Family A Family B 
(Income Rs. 500) (Income Rs. 300) 
Food 150 150 
Clothing 125 60 
Education 25 50 
Miscellaneous 190 70 
Saving or Deficit +10 —30 


[Meerut U. M.A. (Eco.) 1975] 
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Solution. The data can be represented by sub-divided bar 
diagram as shown below : 
SUB-DIVIDED BAR SHOWING 
EXPENDITURE OF TWO FAMILIES 


EDUCATION 


EDUCATION 


N 
\ 
FAMILY A 


FAMILY B 


EXPENDITURE (IN RUPEES) 


N 
NS 


Fig. 4.4 
D. PERCENTAGE BAR DIAGRAM 


Sub-divided or component bar diagram presented graphically 
on percentage basis gives percentage bar diagrams. They are speci- 
ally useful for the diagrammatic portrayal of the relátive changes in 
the data. Percentage bar diagram is used to highlight the relative 
importance of the various component parts to the whole. ‘The total 
for cach bar is taken as 100 and the value of each component or 
Part is expressed, as percentage of the respective totals. Thus ina 
percentage bar diagram, all the bars will be of the same height viz., 
100, while the various segments of the bar representing the different 
Components will vary in height depending on their percentage 
values to the total. Percentage bars are quite convenient and useful 
for comparing two or more sets of data. 

Example 44. Following is the break up of the expenditure of 
a family on different items of consumption. Draw percentage bar 
diagram to. represent the data. 


Аа ао ЗСУ ОБАШОН АЧНА 


Item Expenditure (Rs.) 
a an cM сыл. с у 

Food 240 

Clothing 66 

Rent 125 

Fuel and Lighting 57 

Education 42 


Miscellaneous 190 
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Solution. First of all we convert the given figures into pers: 
centages of the total expenditure as detailed below. 


ae a 


Item | Expenditure 
| Rs. % Cumulative % 
he 
Food | 240 249. 100=33°33 33°33 
| 2 
Clothing 66 556 100= 9:17 43:50 
; DES si 
Rent | 125 770% 100—17:36 59:86 
| Fuel and lighting 57 3 100— 7:92 67°78 
з 42 : 1 
| Education 42 -50*100= 5°83 73°61 
| Miscellaneous 1 190 TR x 100-2639 100:00 
ү 
Ў Total 720 100 


DIAGRAM SHOWING EXPENDITURE OF FAMILY 
ON DIFFERENT ITEMS OF CONSUMPTION 


100 


MISCELLANEOUS 


26.32% 
60 


EDUCATION 
5.83% 
FUEL AND LIGHTING 


60) 7.92% 


RENT 
17.367 


CLOTHING 
9.17% 


40 


PERCENTAGE EXPENDITURE 


20 
| 


0° 
i FAMILY 
Fig.4.5 
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Example 45. Draw а bar chart for the following data showing 
the percentage of total population in villages and towns : а 
E m Percentage of total population $n. 
Villages Towns 


Infant and young children 13.7 12.9 
ria and girls 25'1 232 
Young men and women 82:3 36:5 
Middle-aged men and women 20°4 20°1 
Elderly persons 85 73 
Solution. 
CALCULATIONS FOR PERCENTAGE BAR DIAGRAMS 


Villages 


Category 


Infants and young chil 
dren 


Boys and girls 
Young men and wo- 
е 


теп 

Middle aged men and 
women 

Blderly persons 


PERCENTAGE BAR DIAGRAM SHOWING TOTAL POPULATION 
IN VILLAGES AND TOWNS BY DIFFERENT CATEGORIES 


INFANTS AND 
YOUNG CHILDREN 


BOYS AND GIRLS 
YOUNG MEN AND 

SS] WOMEN 

RSSI MEN. ANO WOMEN 

IL ]]}evoerty Persons 


ER 


N 


PERC ENTAGE POPULATION 
Ll 
7 
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D. MULTIPLE BAR DIAGRAM 


A limitation of the simple bar diagram was that it can be used 
to portray only a single characteristic or category of the data. If 
two or more sets of inter-related phenomena or variables are to be 
presented graphically, multiple bar diagrams are used, The tech“ 
nique of drawing multiple bar diagram is basically same as that of 
drawing simple bar diagram. In this case, a set of adjacent bars 
(one for each variable) is drawn, Proper and equal spacing is given 
between different sets of the bars. To distinguish between the 
different bars in a set, different colours, shades, dottings or crossings 
may be used and key or index to this effect may be given. 


Example 4'6. The data below give the yearly profits (in thou- 
sand of rupees) of two companies A and B. 


Profits in (000 rupees) 


Company A Company B 


1970—71 


Represent the data by means of a suitable diagram, 
[1.0.W.A. (Intermediate) June 1975] 


Solution, The data can be suitably represented by a multiple 
bar diagram as shown below. 


YEARLY PROFITS IN 000 RUPEES 


[LL] company А 
СОМРАМҮ В 


175 


100 


1966-67 1967-68 1968-69 !969-70 1970-71 


Fig. 4.7 


. Remark. А careful examination of the above figures of pro- 
fits for the two companies A and B reveals that in all the years from 
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1966-67 to 1970-71, company A shows higher profits than the com- 
pany B. In such asituation when the values of one concern or 
unit show an increase over the values of the other concern or unit 
for all the periods under consideration, the data can be elegantly 
represented by a special type of sub-divided bar diagram, in which 
total refers to the values of the concern or unit with higher values 
and the lower portion (shaded) of the bar shows the values of other 
concern. The remaining portion (blank) shows the balance (excess) 
of the two concerns or units. We represent below the above data in 
this manner, 


YEARLY PROFITS IN '000 RUPEES 


150 


100 


[| EXCESS 


50 


1966-67 1967-68 1968-69 1969-70 1970-71 


Fig. 4.8 


Example 4.7. The following data shows the students in millions 
on rolls at school/university stage in India according to different class 
groups and sex for the year 1970—71 as on 31st March. 


Class I to V 

Class VI to УШ 
Class IX to XI 
University/College 


Represent the data by (i) Component bar diagram and (ii) Multi- 
ple bar diagram. 
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Solution. 


(i) 


NUMBER OF STUDENTS IN MILLIONS 


NUMBER OF STUDENTS IN MILLIONS 


131 


COMPONENT BAR DIAGRAM SHOWING 
STUDENTS ON ROLL AT SCHOOL UNIVERSITY 
STAGE ACCORDING TO SEX IN 1970-71 


57.05 


GIRLS 


Й BOYS 


CLASS CLASS CLASS UNIVERSITY/COLLEGE 
1-м vivili IX -XI 


Fig. 4.9 


MULTIPLE BAR DIAGRAM SHOWING 
STUDENTS ON ROLL AT SCHOOL UNIVERSITY 


STAGE ACCORDING TO SEX IN 1970-71 


2 
BOYS A 
GIRLS 


eu 
$ 
N 


ы 
З 


CLASS — UNIVERSITY/COLLEGE 
IX - 1X 


CLASS CLASS 
ley vi-vill 


Fig. 4.10 
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F. DEVIATION BARS 

Deviation bars are specially useful for graphic presentation of 
net quantities viz., surplus or deficit, e.g., net profit or loss, net of 
imports and exports which have both positive and negative values. 
The positive deviations (e.g., profits, surplus) are presented by bars 
above the baseline while negative deviations (loss, deficit) are re- 
presented by bars below the base line. The following examples will 
illustrate the points, 

Remark. Deviation bars are also sometimes known as 
Bilateral Bar Diagrams and are used to depict plus (surplus) and 
minus (deficit) directions from the point of reference. 

The following diagram (deviation bars) shows the overall 
position of the Indian Railway Budget for the years 1971-72 to 


RAILWAY BUDGET OVERALL POSITION 
(RS. IN CRORES) 


SURPLUS 


(+) 


1972-73 
1973774 
1974 - 75 
1975 - 76 
1976-77 
1977 - 78 
1978-79 
(REVISED) 
1979-80 
(BUDGET) 


y 
[ы 
[3 
е 


DEFICIT 


C) 


Fig. 4.11 
Source : Statesman. 21st Feb. 1979. 
For some more illustrations, see Examples 4.3 and 4.14. 
G. BROKEN BARS 
Broken bars are used for graphic presentation of the data 
which contain very "wide variations in the values i.e., the data 


— 
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which contain very large observations along with small observa- 
tions. In this case the squeezing of the vertical scale will not be of 
much help because it will make the small bars to look too small and 
clumsy and thus will not reveal the true characteristics of the data, 
In order to provide adequate and reasonable shape for smaller bars, 
the larger (or largest) bar(s) may be broken at the top, as illustrated 
in mples 4.8and 4.9. 
emark. However, if all the observations are fairly large 

so that all the bars have a broken vertical axis, then instead they 
can be drawn with a false base line for the vertical axis, [For false 
base line, see $ 4.4,2- Graphic Presentation]. 

Example 4.8. The following data relates to the imports of 
foreign merchandise and export (including re-exports) of Indian mer- 
chandise (in million rupees) for some countries for the year 1975-76. 


Country Exporis 


Burma 53 

Czechoslovakia 

Canada 424 
Australia 471 
Italy 785 
France 835 
Germany (F.R.) 1,173 
Tran 2,708 
United Kingdom 4,020 
USSR 4,128 
Japan 4,263 
USA 5,054 


‘Represent the data by suitable diagram. 
Solution. The above datais represented by bar-diagrams as 


shown below. 
FOREIGN TRADE BY COUNTRIES 
1975-76 


EXPORTS 


IMPORTS 
[| BURMA (INCLUDING RE-EXPORTS) 


SLOVAKIA 
КУЙА CANADA 


o = 200 400 600 


600 400 200 d 
RUPEES TEN MILLION RUPEES TEN MILLION 


Fig. 4.12 
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Example 4.9. Represent the following data relating to the 
military statistics at China-Vieinam border during China-Vietnam war 
$n February 1979 by multiple bar diagram. ; 


Category 


Army Divisions 
Semi-Army Units 
Fighter Planes 
Tanks 

Total Troops 


Solution. 


MILITARY STATISTICS AT 
CHINA-VIETNAM BORDER 


VIETNAM 
CHINA 


ee 


mi 


Army Se Fighter Tank 
Divisions и Planes ы pa 
nits 


Fig. 4.13 

Source : Indian Express 20th February, 1979. 

. 434. Two Dimensional Diagrams. Line or bar diagrams 
discussed so far are one dimensional diagrams since the magnitudes 
of the observations are represented by only one of the dimensions 
viz., height (length) of the bars while the width of the bars is arbi- 
trary and uniform. .However, in two dimensional diagrams, the 


diagrams are : A 
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(i) Rectangles 
(ti) Squares 

(i53) Circles 

(iv) Angular or pie diagrams, 

(A) Rectangles. A “rectangle” is a two dimensional diagram 
because it is based on the area principle. Since the area ofa rect- 
angle is given by the product of its length and breadth, in a rect« 
angle diagram both the dimensions viz. length (height) and width 
of the bars is taken into consideration. 

Just like bars, the rectangles are placed side by side, proper 
and equal spacing being form between different rectangles. In fact, 
rectangle diagrams are a modified form of bar diagrams and give a 
more detailed information thanbar diagrams. 

Like sub-divided bars, we have also sub-divided rectangles for 
depicting the total and its break up into various components, Like- 
wise percentage rectangle diagram may be. used to portray the 
relative magnitudes of two or more sets of data and their compo- 
nents making up the total. We give below a few illustrations. 

Example 4.10, Prepare a rectangular diagram from the follows 
ing particulars relating to the production of a commodity in a factory. 


Units produced 1,000 
Cost of raw materials Rs. 5,000 
Direct expenses Rs. 2,000 
In-direct expenses Re. 1,000 
Profi Re. 1,000 


(Delhi U. B. Com. 1977) 
Solution, First of all we will find the cost of material, 
expenses and profits per unit as given on page 150. 
DIAGRAM SHOWING COST AND PROFIT FOR A 
COMMODITY IN A FACTORY 


EXPENSES 


INDIRECT 
EXPENSES 


PROFIT 


Cost and Profit Per Unit (in Rs.) 


1000 
UNITS PRODUCED 


Fig. 4.14 
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Cost of raw material per unit- Rs. 5 
Direct expenses per unit =Rs. 2 
Indirect expenses per unit  —Rs. 1 
Profit per unit =Rs. 1 


Example 4.11. The following data relates to the monthly 
expenditure (in Ёз.) of two families A and В. 


Expenditure (in Rs.) 


Items of Expenditure 


Family A Family B 


ent 
Light and fuel 
Miscellaneous 


Represent it by а suitable percentage diagram. 


Solution. Since the total expenses of the two families are 
different, an appropriate percentage diagram for the above data 
will be rectangular diagram on percentage basis. The percentage 
bar diagram will not be able to reflect the inherent differences in 
the total expenditures of the two families. 


The widths of the rectangles will be taken in the ratio of the 
total expenses of the two families, viz., 400 : 240 i.e, 5:3. 


CALCULATIONS FOR PERCENTAGE RECTANGULAR 
DIAGRAM 


Family B 
Пет of 


Expenditure М % Cumulative % 


—— 


3 
г 
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PERCENTAGE RECTANGLE DIAGRAM 
SHOWING MONTHLY EXPENDITURE OF 
TWO FAMILIES A AND B 


100 


90 


| be 


FAMILY A FAMILY B 


Fig. 4.15, 
Example 4.12. Represent the following data by a percentage 
sub-divided bar diagram, 


Item of Family A Family B 
Expenditure Income Rs. 500 Income Rs, 300 
Food 150 150 

Clothes 125 60 
Education 25 60 
Miscellaneous 190 70 

Savings or Deficit +10 —3 


0 
[Delhi U. В. сот. (Hons.) 1982] 
Solution. Since the total incomes of the two families are 
different, an appropriate percentage bar diagram for the above data 
will be rectangular diagram on percentage basis. The percentage 
bar diagram will not be able to reflect the inherent differences in 
the total incomes in the two families. · 

. The widths of the rectangles will be taken in the ratio of the 

total incomes of the families viz., 500 : 300 i.e, 5:3. 
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CALCULATIONS FOR PERCENTAGE RECTANGULAR 
DIAGRAM 


Family A 
Item of Ex- Expendtiure % cumulative 
(Rs.) % 


penditure 


Total 500 300 


PERCENTAGE DIAGRAM SHOWING MONTHLY 
INCOME AND EXPENDITURE OF TWO FAMILIES 
A AND B 


FAMILY A FAMILY B 


FOOD 
CLOTHES 
EDUCATION 


PERCENTAGE EXPENDITURE 


2 SAVING 10 DEFICIET 


Rs. 500 Rs. 300 
INCOME ——— 


Fig. 4.16. 
Example 4.13. Draw a suitable diagram to represent the 
following information. 


Selling price Quantity Total Cost ( in Rs.) 

per unit sold 
"BI MEE E M EC 
actory и 
Расіогу Ү 600 30 6,000 6,000 9,000 21,000 


Show also the profit or loss as the сазе may be. 
[Delhi U. B. Com. (Hons) 1973] 
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Solution. First of all we shall calculate the cost (wages, 
materials, Misc,) and profit per unit as given in the following table, 


Selling price| Quantity Cost per unit (in Rs.) Profit per 

per unit sold unit(in Rs.) 
Factory X 80 | 360 
Factory Y 300 {700 


Note. Negative profit is regarded as loss, 

An appropriate diagram for representing this data would be 
the ‘Rectangles’ whose widths are in the ratio of the quantities sold 
ie, 20: 30 1e,2:3. Selling prices would be represented by the 
corresponding heights of the rectangles with various factors of cost 
(wages, materials, misc.) and profit or loss represented by the 
various divisions of the rectangles as shown in the following 
diagram, 

Remark. In the case of profit i.e., when selling price (S.P.) 
is greater than cost price(C. P), the entire rectangle will lie above 
the X-axis, the segment just above the X-axis showing profit. But in 
case of lossie, when S. P. is less than C.P., we will have the 
rectangle with a portion lying below the X-axis which will reflect 
the loss incurred s.e, the cost not recovered through sales. The 

SUB.DIVIDED RECTANGLE SHOWING COST, SALES AND 
PROFIT OR LOSS PER UNIT 


FACTORY - Y 
30 UNITS 


FACTORY - X 
20 UNITS 


(SALES (IN RUPEES) 
М 
$ 
е 


-100 


Fig. 4.17. 
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values of each component are given by the product of the base with 
the corresponding height of the component (rectangle), For example, 
for the factory X, the area of the component for wages is 20 X 160— 
3200, which is the given cost 


Aliter. The above data can also be expressed as follows : 


Factory Y 


Total sales 
(in Rs.) 600x 30—18,000 
"Total cost 
(in Rs.) 21,000 


Profit (in Rs.) —3,000 


Note. Negative profit is regarded as loss. 
Data are now expressed diagrammatically by sub-divided 
(or component) bar diagram, by taking the tatal sales as bars along 
theY-axis and various factors of cost (viz., wages, materials, mis- 
cellaneous) and profit as the various divisions of the bars as shown 
low. 


SUB-DIVIDED BAR DIAGRAM SHOWING COST, SALES AND 
PROFIT OR LOSS 


FACTORY - Y 


FACTORY - X 


SALES (IN RUPEES) 
© 
S 
8 
e 


Fig. 4.18 
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В. Square Diagrams. Among the two dimensional dia- 
grams, squares are specially useful if it is desired to compare graphi- 
cally the values or quantities which differ widely from one another 
as ¢.g., the population of different countries at a given time or of 
the same country at different times or the imports or exports of 
different countries. In such a situation, the bar diagrams are not 
suitable, since they will give very disproportionate bars ġe., the bars 
corresponding to smaller quantities would be comparatively too 
small and those corresponding: to bigger values would be too big. 
In particular, if two values are in the proportion of 1:25 and if 
we draw bar diagram, one (bigger) will be 25 times (in height) than 
that of the other (smaller), In such a situation, square diagrams 


give a better presentation. 


Like rectangle diagram, square diagram is a two dimensional 
diagram in which the given values are represented by the area of 
the square, Since the area of the square is given by the square of its 
side, the side of the square diagram will be in proportion to the 
square root of the given observations. Thus if the two observations 
are inthe ratio of 1:25, thesides of the squares will be in the 


ratio of their square roots viz., 1 : 5. 

Construction of the square diagrams is quite simple. First of 
all we obtain the square roots of the given observations and then 
squares are drawn with sides proportional to these square roots, on 
an appropriate scale which must be specified. 

Remarks 1. The square may be drawn horizontally (on the 
same base line) or vertically one below the other to facilitate com- 
parisons. However, in practice, the first method viz., horizontal pre- 
sentation is generally used since it economises space. 


2. Although square diagram is a two dimensional diagram, it 
is used to depict only a single magnitude or value. 
4. Draw a square diagram to represent the follow- 


Example 4.1 
ing data 
Country A B [^] 
Yield in (kg.) per hectare 350 647 1,120 
(Bombay U.B. Com. 1976) 


Solution. The square roots of the given yields in (kg) per 
hectare give the proportion of the sides of the corresponding squares. 
The calculations are shown in the following table : 


ci bse Gait 
350 


647 


Country 


Yield in (kg) per hectare 
Square root 18°7083 25°4362 
Ratio of the sides of 1 1:36 


the squares 
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YIELD IN KG (PER HECTARE) 


SCALE 


1 SQUARE CM =350 kG 


Fig. 4.19 


Remark. In the above table, the ratio of the sides of the 
squares has been olhtained on dividing the squre roots of the values 
for B and C by the square root ofthe value for A. This is easy to 
do on a calculator but, without a calculator it is quite time con- 
suming. However, the things can be simplified to a great extent by 
dividing the Square roots of the values of A, B and C by a 
whole number say 15 or 18, Division by, say, 15, gives the ratio 
ofthe sides of Squares for A, B and C as 1.25, 1.70, 2.23 respec- 


If we construct squares with sides 1.25 cms., 1.70 cms., and 
2'23 cms, the scale will be obtained as follows : 


Area of square for A is (1.25)2= 1.5625 square cms. This area 
represents the value 350 kg. Thus, 


1.5625 sq. cms=350 kg. 
350 
> 1 sq. em NS rar 5695 —224 kg. 


C. Circle Diagram. Circle diagrams are alternative to 
square diagrams and are used for the same purpose, viz., for dias 


root of the given magnitude, Accordingly, the length which were 
taken as the sides of the Square may also be taken as the radii of 
the circles representing the given magnitudes, 


Same amount of work, viz., computing the square roots of the given 
magnitudes (rather circles are easy to draw), cirele diagrams are 
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2. Since square and circle diagrams are to be compared on 
an area basis, it is difficult to judge the relative magnitudes with 
precision, particularly by a layman without any mathematical or 
statistical background. Accordingly, proper care should be taken 
tointerpret them. They are also tmore difficult to construct than 
the rectangle diagrams. 

3. Scale. The scaleto be used for constructing circle dia- 
grams can be calculated as follows : 


For a given magnitude ‘a’ we have 
Area—nr? square units=a 

- 1 square unit=a/nr*. 

Example 4.15. Represent the data of Example 415 bya 
circular diagram. 

Solution. The data of example 4-15 can be represented by 
a circular diagram on taking the lengths of the sides of the squares 
which were taken in Example 4.15, as the radii of the correpond- 
ing circles, Hewever, in this case, the scale will be modified 
accordingly. 

350 2450 
Scale : 1 sq. em. =~ = 55 = 111.36 kg. 


YIELD IN KG (PER HECTARE) 


SCALE: 150. CM = 111.36 KG 


usc 


Fig. 4.20 e 


D. Angular or Pie Diagram. Just as sub-divided and 
percentage bars or rectangles are used to represent the total magni- 
tude and its various components, the circle (representing the total) 
may be divided into various sections or segments viz., sectors 
representing certain proportion or percentage of the various 
component parts to the total. Such a sub-divided circle diagram is 
known as an angular or pie diagram, named so because the various. 
segments resemble sliées cut from a pie. 


Steps for Construction of Pie Diagram 


($) Express each of the component values as a percentage of 
the respective total. 
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(i$) Since the angle at the centre of the circle is 360°, the total 
magnitude of the various components is taken to be equal to 360° 
and each component part is to be expressed Proportionately in 
degrees. Since 1 per cent of the total value is equal to 360/100 
==3.6° the percentage of the component Parts obtained in (i) can be 
converted to degrees by multiplying each of them by 3.6. 


(itt) Draw a circle of appropriate radius using an appro- 
priate scale depending on the space available. If only one category 
or characteristic is to be used, the circle may be drawn of any 
radius. However, if twoor more sets of data are to be presented 
simultaneously for comparative studies then the radii of the cor- 
responding circles are to be Proportional to the square roots of 
their total magnitudes. 


(iv) Having drawn the circle, draw any radius (preferably 
horizontal). Now with this radius as the base line draw an angle 
at the centre [with the help of protractor (D)] equalto the degree 
represented by the first component, the new line drawn at the 
centre to form this angle will touch the circumference. ‘The sector 
so obtained will represent the Proportion of the first component, 


(v) Different sectors representing various component parts 
should be distinguished from one another by using different shades, 
dottings, colours etc., or giving them explanatory or descriptive 
labfes either inside the sector (if possible) or just outside the circle 
with proper identification. 


Remarks 1. The degrees represented by the various compo- 
nent parts of a given magnitude can be obtained directly without 
computing their percentage to the total value as follows : 

Degree of any component part 

Component value 
Жн Total value 380; 


2. Pie diagrams are also called circular diagrams. 


3. Sincethe comparison of the pie diagrams is to be made 
on the basis of the areas of the circles and of various sectors which 
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We give below some illustrations of pie diagrams. 


Example 4.16. Draw a pie diagram for the following dat 
of Siath Five Year Plan Public Sector uM following data 


Agricullural and Rural Development 12.9% 
Irrigation eic. 12.5% 
Energy 27.2% 
Indusiry and Minerals 15.4% 
Transport Communication 15.9% 
Social Services and Other 16.1% 


(Bombay U. B. Com. April 1981) 
Solution, The angle at the centre is given by 


Percentage outlay x360— 
100 


COMPUTATIONS FOR PIE DIAGRAM 


Percentage outlay X 3.6? 


Sector 
а) 


Angle at the centre 
(32 (2)x 3°6° 


Agriculture and Rural Development 12'9х3'6=46° 
Irrigation etc. 12:5x 3'6==45° 
Епеггу 27'2х3°б==98° 
Industry and Minerals 15'4х 3°6=»56° 
Transport Communications etc, 15:9х3'6=57° 


Social Services and other 16°1Х3'6<=58° 


Total 


PIE DIAGRAM SHOWING SIXTH FIVE YEAR 
PLAN PUBLIC SECTOR OUTLAYS 


IRRIGATION ЕТС. 


AGRICULTURE 
& RURAL 
DEVELOPMENT 


INDUSTRY 
& MINERALS SOCIAL 
SERVICES 
& OTHERS 
TRANSPORT 
COMMUNICATION 
ёғтс. 


Fig. 4.21 
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Example 417. The following table gives the distribution of 
outlay in a Five Year Plan of India under the major heads of develop- 
ment expenditure : 

Head of Expenditure Expenditure 
(in crores of Re.) 
(a) Agriculture and community 


development 8,000 
(b) Irrigation and power 4,000 
(c) Industry and Mining 7,000 
(d) Transport and communication 5,500 
(e) Miscellaneous 2,500 


Total 27,000 


Represent the above information by а Pie-chart, 
[1.C.W.A. (Intermediate) June 1983] 


Solution. 
CALCULATIONS FOR PIE CHART 
Head of Expenditure Angle at the 
Expenditure (in crores of Rs.) centre 
08 
@) (2) D= др < 360° 
(а) Agriculture and comm- 
unity development 8,000 106"67°%= 107° 
(b) Irrigation and power 4,000 53°33°—53° 
(с) Industry and Mining 7,000 93°33° 293° 
Trai t and communi- 
^ Чо > 5,500 73:33 73° 
(е) Miscellaneous 2,500 33'33°=е34° 
Total 27,000 360° 
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PIE CHART SHOWING EXPENDITURE ON 
MAJOR HEADS IN A FIVE YEAR PLAN 


AGRICULTURE AND 
COMMUNITY 


IRRIGATION 
DEVELOPMENT 


POWER 


к————=җште знн 
[0605250550094 
SSS SS ese - 
СБа) е: Әә 
——— J 
== 
— 
Ss 
aes 
R4 
paas 
SLN 


TRANSPORT AND 
COMMUNICATION 


Fig. 4.22 


INDUSTRY 
AND 
MINING 


MISCELLANEOUS 


Example 4.18, The following data shows the expenditure on 
various heads in the first three five year plans (in crores of rupees). 


Expenditure (in crores Rs.) 
Third plan 


Agriculture and C.D. 
Irrigation and power 
Village and small indus- 


tries 264 
Industry and minerals 1520 
Transport and commu- 
nication 1486 
Social services and mis- 

1500 


cellaneous 


Represent the data by angular (pie) diagrams. 
Solution. (Cont. on page 148), 
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EXPENDITURE ON VARIOUS HEADS IN FIRST THREE FIVE YEAR PLANS 


AGRICULTURE AND C.D 
IRRIGATION AND POWER 


VILLAGE AND SMALL INDUSTRIES 


1ST PLAN 
SECOND PLAN 
THIRD PLAN 


El INDUSTRY AND MINERALS “АЫ 

TRANSPORT АМО COMMUNICATION 

SOCIAL SERVICES AND MISCELLANEOUS 
Fig. 423 

4.34. Three Dimensional Diagrams. Three dimensional 
diagrams, also termed as volume diagrams are those in which three 
dimensions, viz., length, breadth and height are taken into account. 
They are constructed so that the given magnitudes are represented 
by the volumes of the corresponding diagrams. The common forms 
of such diagrams are cubes, spheres, cylinders, blocks, etc. These 
diagrams are specially useful if there are very wide variations bet- 
ween the smallest and the largest magnitudes to be represented. Of 
the various three dimensional diagrams, ‘cubes’ are the simplest and 
most commonly used devices of diagrammatic presentation of the 
data. 

Cubes. For instance, if the smallest and the largest magnitu- 
des to be presented are in the ratio of 1 : 1000, the bar diagrams can 
not be used because the height of the biggest bar would be 1000 
times the height of the smallest bar and thus they would look very 
disproportionate and clumsy. On the other hand, if square or circle 
diagrams are used then the sides (radii) of the squares (circles) will 
be in the ratio of the square roots viz., 1 : M 1000 t.e., 1: 31°63 i.e., 
1 : 32 (арргох.), which will again give quite disproportionate dia- 
grams. However, if cubes are used to present this data, then since 
the volume of cube of side z is 23, the sides of the cubes will be in 


the ratio of their cube roots viz., T E 1000 i.e., 1 : 10 which will 
give reasonably proportionate diagrams as compared to one dimen- 
sional or two dimensional diagrams. 

Construction of a Cube of Side ‘x’, The various steps are out- 
lined below : 

(i) Construct a square ABCD of side x. 

(ij) Draw EF as right bisector i.e., perpendicular bisector of 


AB [This is done by finding the mid-point of AB and then drawing 
perpendicular at that point to the line AB] such that EF=AB and 


half of it is above AB and half of itis below AB. 
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Fig. 4:24 
(iii) Join AE, CF and EF, 
(iv) Through B draw a line BG parallel to AE (i,e., BG || AE) 


such that BG—AE. 
ү) Join EG and through G draw a line GH | EF such that 
-—EF. 


GH 

(vi) Join D and H. 

(vii) Rub offthelines CF, EF and FH. Now CDHGEA is 
the required cube. 

Remarks 1. As already discussed, three dimensional dia- 
grams are used with advantage over one or two dimensional dia- 
grams if the range i.e., the gap between the smallest and the largest 
observation to by presented, is very large. Moreover, they are more 
beautiful and appealing to the eye than bars, rectangles, squares or 
Circles, However, since three dimensional diagrams are quite diffi- 
cultto construct and comprehend as Compared to one or two 
dimensional diagrams, they are not very popular. Further, as the 
magnitudes are represented by the volumes of the cubes (volumes) 
of the three dimensional diagrams, in general), it is very difficult to 
visualise and hence interpret them with precision. 

2 Cylinders, spheres and blocks are quite difficult to cons, 
truct and are, therefore, not discussed here, 
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3. It is worthwhile pointing out here that now-a-days projec- 
tion techniques are used to represent even one dimensional dia- 
grams as three dimensional diagrams for giving them а beautiful 
and attractive get up. 


Example 4.19. The following table gives the population of 
India on the basis of religion. 


Religion Number (in lakha) 
Hinduism 2031.9 
Islam 354.0 
Christianity 8r6 
Stkhism 62°2 
Others 36'8 


Represent the data by cubes. 


Solution. The sides of the cubes will be proportional to the 
cube roots of the magnitudes they represent. 

Note. To compute cube root of a, viz., 

Ya ог a!/*, let 

у= aas 
Taking logarithm of both sides, 
log y=} logoa . 
+ y=Antilog [3 logya] 
COMPUTATION OF CUBE ROOTS 

S prece mo ut em ee 


a ога $ logua Anti-log [3 loga] Ratio of 
sides 
Jp ee ee шош LA 
2031:9 3:3079 11026 12:67 3:7426371 
354:0 2:5441 0:8480 7:047 2708221 
816 1:9117 0:6372 4:337 1°28ее1°3 
62:2 17938 0:5979 3:962 1°17ее12, 
363 1:5599 0:5199 388 1 


Oo шыш эшо Td 


Now we can express the data diagrammatically by drawing. 
cubes with sides proportional to the values given in the last. 


column. 
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a SCALE 1/8 CMS 3=36.3 LAKHS 


37 


represented in pictograms by 3 ships and about a half more ; 23] 
vessels will be represented in pictogram by 4 ships and a proportio- 
nate fraction of the 5th. This proportionate representation intro- 
duces error and is quite difficult to visualise with precision. We give 
below some illustrations of pictegrams, 
The following table gives the number of Students studying in 
schools/colleges for different years in India. 
STUDENTS ON ROLL AT THE SCHOOL/UNIVERSITY 
STAGE 


(As on 31st March) 


Million 
Stage 1960-61 1965-66 1970-71 1974-75 1975-76 
Class I to XT 44:73 66:29 76:95 87:30 89:46 
University/College 0°73 1:24 2:81 2:94 3:21 


Source; Ministry of Education and Social Welfare. 
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A data can be represented in the dia i 
Pictogram (pictures) as given below : rao E d 


STUDENTS ON ROLL AT SCHOOL/UNIVERSITY STAGE 


EIL rn 


1965-66 


1970 - 71 


1974-75. 


1975-76 


ndi LJ 
аз do Ll RR 
ud ll 
uda x L 
uc I ги» 
ade d Ii 
nda d i 
ке» ир eniti 
nx 


UNIVERSITY/COLLEGE 
1960 - 61 


= 0.5 MILLION 


1965-66 


1970- 71 


1974-75 


mnie mip edit кй 
m du => ==> 
= = =} 
=> юй 


Pt 
Pt 
uu $ i $ $ Ei i 


Fig. 4.26 

Data pertaining to the population of a country at different age 
groups are usually represented through gue by means of so call- 
ed pyramids. The pyramid in fig. 4'27 exhibits the population of 
India at various age-groups according to 1971 census as given in the 
table below. 

DISTRIBUTION OF POPULATION BY AGE GROUPS 
(1971 Census) 


Age-group Population (in "000) 
Under 1 16,519 
1 63,040 
5—14 1,50,776 
15—24 90,569 
25—34 77,010 
35—44 61,186 
45—54 43,416 
55—64 27,202 
65—74 12,800 
75 and over 5,446 


Source ; Registrar General.of India. 
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AGE PYRAMIDS - INDIA -CENSUS 1971 
POPULATIÓN IN MILLIONS 


Pee Bn 
= m 


7 


Fig. 4.27 


Example 4:20. The following table gives the number of vessels 
in Indian Merchant Shipping fleet for different years. 


Year No. of vessels as on 31st December 
1961 174 

1966 231 

1971 5 

1975 3 

1976 359 


Represent the data by pictogram, 
Solution. 


NUMBER OF VESSELS IN INDIAN MERCHANT SHIPPING FLEET 


lio. een: 
1961 e uir dii, uf 


Fig. 4.28 
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4.3.6. Cartograms. In cartograms, statistical facts are 
presented through maps accompanied by various types of diagram- 
matic representation. They are specially used to depict the quanti 
tative facts on a regional or geographical basis e.g., the population 
density of different states in a country or different countries in the 
world, or the distribution of the rainfall in different regions ofa 
country can be shown with the help of maps or cartograms. The 
different regions or geographical zones are depicted on a map and 
the quantities or magnitudes in the regions may be shown by dots, 
different shades or colours etc., or by placing bars or pictograms 
in each region or by writing the magnitudes to be represented in 
the respective regions, Cartograms аге simple'and elementary forms 
of visual presentation and are easy to understand, They are gene- 
rally used when the regional or geographic comparisons are to be 
highlighted. 


43.7. Choice of a Diagram. In the previous sections we 
have described different types of diagrams which can be used to 
present the given set of numerical data and also discussed briefly 
their relative merits and demerits. No single diagram is suited for 
all practical situations. The choice of a particular diagram for 
visual presentation of a given set of data is not an easy one and 
requires great skill, intelligence and expertise. The choice will 
primarily depend upon the nature of the data and the object of 
presentation, i.e., the type of the audience to whom the diagrams 
аге to be presented and it should be made with utmost care and 
caution, À wrong or injudicious selection of the diagram will distort 
the true characteristics of the phenomenon to be presented and 
mightlead to very wrong and misleading interpretations. Some 
special types of data, viz , the data relating to frequency curves and 
time series are best represented by means of graphs which we will 
discuss in the following sections. 


Exercise 4.1 


1. (a) What are the merits and limitations of diagrammatic represen- 


tation of statistical data ? 
(Nagarjuna U. B. Com. Oct. 1981) 


(b) Describe the advantages of diagrammatic representation of statis- 
tical data. Name the different types of diagrams commonly used and mention 
the situations where the use of each type of diagram would be appropriate. 

[LC.W.A. (intermediate) June, 1975] 


2. (a) What are the different types of diagrams which are used in sta- 


tistics to show the salient characteristics of group and series ? Illustrate your 
HEUS e [Delhi U. B.Com. (Hons.) 1976] 


i f diagrammatic ге resentation of facts. 
Cena ы (Delhi 0? B.Com. (Нопз.) 1977, 73) 


3. “Diagrams do not add anything to the meaning of Statistics but when 


ied intelligently, they bring to view the salient characteristics of 
geet din. D 7 [Osmania U. B.Com. (Hons.) Nov. 1981] 


4. “Diagrams help us visualize the whole meaning of a numerical com- . 
" nt. 
plex at a single glance.” Comment. Osmania U. B.Com. (Hons.) April 1983] 
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5. The merits of diagrammatic presentation of data are classified under 
three heads: attraction, effective impression and comparison. Explain and 
illustrate these points. 


6. (a) State the different methods used for diagrammatic representation 
of statistical data and indicate briefly the advgntages and disadvantages of them. 
(Kurukshetra U. B. Com. Sept. 1980) 


(b) Point out the usefulness of diagrammatic representatlon of facts and 
explain the construction of any one of the different forms of diagrams you 
ow. [Nagarjuna U. B. Com. Oct. 1980) 


7. What types of mistakes are commonly committed in the construction 
of diagrams ? What precautions are necessary in this connection ? 


8. (a) Draw a bar chart to represent the following information : 


Year 1952 1957 1962 1967 1972 1977 
No. of women M.P.'s 22 27 34 31 22 19 
(b) Represent the following data with the help of a bar diagram : 
Year Notes in Circulation Yeor Notes in Circulation 
(Rs. crores) (Rs. crores) 
1970-71 4,221 1974-75 6,231 
1971-72 4,655 1975-76 6,572 
1972-73 5,272 1976-77 7,778 
1973-74 6,159 


Source : RBI Bulletin, March, 1977. 


(c) Ina recent study on causes of strikes in mills, an experimenter col- 
lected the following data. 


Causes : Economic Personal | Political Rivalry Others 
Occurences 58 16 10 6 10 
(in percentage) 

Represent the data by bar chart. 


(d) Below are data on the number of films made ín different regional 
and/or other languages in índia in different years. 


Year : 1947 1951 1961 1970 1971 1972 1973 
No. of films : 281 229 303 396 433 414 448 
Draw a bar chart to represent the above data. 


9. (а) Draw a percentage bar diagram to represent the following data, 


Items of Expenditure Income in Rupees 
Family A Family B 

Food 400 480 
Clothing 200 400 
House Rent 160 200 
Fuel 80 120 
Miscellaneous 160 400 
Total 1,000 1,600 


[Mysore U. B. Com. November 1981] 


(6) Represent the data relating to the cost of construction of two tables by 
means of a percentage diagram. 
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Table I Table П 
(Rs.) (Rs.) 
Wood 5 10 
Other materials 2 8 
Labour 2 5 
Other expenses 1 2 
10 25 


{Osmania U. B. Com. (Hons.) April 1983] 


(c) Draw asuitable diagram to represent the following data on livelihood 


patterns in India. U.S.A. and U.K. 

Occupation India U.S.A. U.K. 
(1) Agriculture and 

Forestry 7196 13% 5% 
(2) Manufacture and 

Commerce 15% 46% 55% 
(3) Other Industries and 

Services 1496 41% 40% 

Total 10096 100% 100% 


10. (а) Construct a multiple bar graph to represent India’s Imports and 


Exports for the following years : 
1972-73 1973-74 1974-75 1975-76 1976-77 


Year 
Imports (In 100 crores of 
(Rupees 19 20 45 53 51 
Exports (In 100 crores ОЕ 
rupees) 20 25 33 40 51 
(b) Draw a suitable diagram to present the following data : 
1978 1979 
- I Division 16 12 
Il Division 40 44 
]H Division 60 72 
e Failures 44 34 
Total No. of candidates 160 162 


(Nagarjuna U. B. Com. April 1980) 


11. (а) Represent the following data by а anal diagram 75 
4 unit of commodit, Rs, 41— Rs. 61— 
[Md а ij 7 40 units 20 units 
Value of raw material Rs. 52/- Rs. 48/- 
Other expenses Rs. 64/- Rs. 42/- 
Profit Rs. 44/- Rs. 30/- 
[Punjab U. В. A, (Econ. Hons.) 1982] 


(b) Represent the following data regarding monthly expenditure of two 
families by a suitable diagram :— 
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It nditur: Family A Family В 
vui as ч Income Ёз. 500 Income Rs. 800 
1. Food Rs. 200 Rs. 250 
2. Clothing 100 200 
3. House Rent 80 100 
4, Fuel and light 40 50 
5. Miscellaneous including 
Savings 80 200 
500 800 


[Punjab О, В.А. (Econ. Hons.) 1981 
(c) Represent the following data by a sub-divided Bar-diagram : 


No. of students 
College Arts Science Commerce Agriculture Total 
A 1200 800 600 400 3000 
B 750 500 300 450 2000 


(Delhi U. B. Com. 1982) 
12, Draw a rectangular diagram to represent the following information : 


Factory A Factory B 
Price per unit Rs, 1500 Rs. 12:00 
Units produced — 1000 Nos. 1200 Nos. 
Raw material/unit. Rs. 5:00 Rs. 5:00 
Other expenses/unit Rs. 4:00 Rs. 3:00 
Profit/unit Rs. 6:00 Rs. 4.00 


{Bombay U. B. Сот. May 1978] 


13. Describe bar diagrams used in Statistics. The following table gives the 
monthly expenditure of two' families A and B. Compare these fingures by a 
suitable diagram. 


Items of Family A Family B 
expenditure income income 
Rs, 500 Rs. 800 
Food 140 240 
Clothing 80 160 
House Rent 100 120 
Education 30 80 
Fuel and Lighting 40 40 
Miscellaneous 40 80 


[Bangalore U. B. Com., May 1979; Gujarat U. B. Com., April 1978] 


. ..M. (а) What до you mean by two-dimensional diagrams ? Under what 
situations they are prefered to one dimensional diagrams ? 


(b) Describe the (!) square and (ii) circle di . Di i 
Su MM () за nd (i) circle diagrams. Discuss their 


(c) What do you understand by Píe diagrams. Discuss the technique of 
constructing such diagrams. 


15. The following table gives the average approximate yield of rice in 
ke, per acre in three different countries, Draw square diagrams to represent the 
a: 


Country A B c 
Yield in (kg.) per acre 350 647 1120 
[Bombay U. B. Com. October 1973] 


| ; 
M 
$ 


Diagrammatic and Graphic Representation 159 


16. Represent the following data on production of Tea, Cocoa and 
Coffee by means of a pie diagram. 


Tea Cocoa Coffee Total 
3,260 tons 1,850 tons 900 tons 6,010 tons 
[Delhi U. B. Com. (Hons.) 1980) 
17. (a) Point out the usefulness of digrammatic representation of facts 
and explain the construction of volume and pie diagrams. 
(Punjab U. B. A. (Econ, Hons.) 1981] 
(b) А Rupee spent on ‘Khadi’ is distributed as follows : 


Paise 
Farmer 19 
Carder and Spinner 35 
Weaver 28 
Washerman, Dyer and Printer 8 
Administrative Agency 10 
Total 100 


Present the data in the form of a pie diagram, 


18. (a) Draw a Pie diragram to represent the distribution of a certain 
blood group ‘O’ among Gypsies, Indians and Hungarians, 


Frequency 
———————————D 
Blood group Gypsies Indian Hungarians Total 
Ке, 343 313 344 1000 


[LC.W.A (Intermediate) Dec, 1979) 
b) A ship has four compartments labelled 1, 2,3,and4, The се 
Jimits of 1, 2, 3, and 4 are respectively, 180,000 cubic feet, 160,000 cubic ‘eet, 
1,40,000 cubic feet and 1,20,000 cubic feet. Present the data about the different 
Space limit in a table and draw a Pie diagram to represent the above data, 
LI. C.W.A. (Intermediate) June 1980] 
(c) Represent the following data of the distribution of income of a 
leading company by a suitable diagram. 


Rupees in Lakhs 

Raw materials 1,689 
Taxes 582 
Manufacturing expenses 543 
Employees 470 
Other expenses 286 
Depreciation 94 
Dividends 75 
Retained Income 51 

3,790 


[Osmania U. B. Com. (Hons.) Nov. 1981] 


19. (a) Represent the following data by means of circular diagrams 
No. of Employed 


Year Men Women Children Total 
000 

1951 1,80,000 1,10,000 70,000 3,60, 

1961 3,50,000 2,10,000 1,60,000 7,20,000 


ИСРА (Intermediate) June 1981) 
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(b) Represent the following data by Pie-diagram! 
Item of Expenditure Expenditure (in Rs.) 
Family A Family B 
Food 150 120 
Clothing 100 80 
Rent and Education 120 80 
Fuel and Electricity 80 40 
Others 90 40 


(Mysore U. B.Com. Oct. 1980) 


(c) Represent the following data by a suitable diagram : 


ELECTRICITY UNDERTAKING (DEC, 1976) 
NUMBER OF UNDERTAKINGS 


Private Sector Public Sector Total 
Steam 23 33 56 
Oil 154 212 366 
Hydro 2 27 29 
179 272 451 
Hint : Sub-divided Bar or Pie chart. (Bombay U.B. Com. April 1976) 


44. Graphic Representation of Data. The difference bet- 
ween the diagrams and graphs has been discussed in $4'2. To 
summarise, diagrams are useful for visual presentation of categorical 
and geographical data while the data relating to time series and 
frequency distributions is best represented through graphs. Dia- 
grams are primarily used for comparative studies and can’t be used 
to study the relationship, (not necessarily functional), between the 
variables understudy. This is done through graphs. Diagrams 
furnish only approximate information and are not of much utility to 
a Statistician from analysis point of view. Onthe other hand, 
graphs are more obvious, precise and accurate than diagrams and 
can be effectively used for further statistical analysis, viz., to study 
slopes, rates of change and for forecasting wherever possible. 
Graphs are drawn on a special type of paper, known as graph paper. 
The advantages of graphic representation of a set of numerical data 
have also been discussed in § 4.1. 


Like diagrams, a large number of graphs are used in practice. 
But they can be broadly classified under the following two heads : 
(i) Graphs of Frequency Distributions, 
(i) Graphs of Time Series, _ 


Before discussing these graphs we shall briefly describe the 
technique of constructing graphs and the general rules for drawing 


graphs. 


441. Technique of Construction of Graphs. Graphs 
are drawn on a special type of paper known as graph paper which 
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has a fine net work of horizontal and vertical lines ; the thick lines ” 
for each division of a centimetre or an inch measure and thin lines 
for small parts of the same. In a graph ofany size, two simple 
lines are drawn at right angle to each other, intersecting at point 
“О? which is known as origin or zero of reference. The two lines are 
known as co-ordinate axes. The horizontal line is called the X-axis : 
and is denoted by X'OX. The vertical line is called the Y-axis and 
is usually denoted by YOY’. Thus the graph is divided into four 
sections, known as the four quadrants, but in practice only the first 
quadrant is generally used unless negative magnitudes are to be 
displayed. Along the X axis the distances measured towards right 
of the origin i.e., towards right of the line YOY’, are positive and 
the distances measured towards left of origin i.e., towards left of the. 
line YOY’ are negative, the origin showing the value zero. Along 
the Y-axis, the distances above the origin i.e., above theline X'OX 


QUADRANT II +4 QUADRANT I 


x-Negative + 3 х- Positive 
y-Positive +2 y- Positive 
(-x,+y) (+x, +y) 


QUADRANT III ? QUADRANT IV 
x-Negative -3 x- Positive 
Y- Negative -4 y- Negative 

Cx. -y) Y^ (+х,-у) 
Fig. 4.29 


are positive and the distances below the origin i.e, below the line 
Х'ОХ are negative. Any pair of the values of the variables is 
represented by a point (x, y), x usually represents the value of the 
independent variable and is shown along the X-axis andy repre- 
sents the value of the dependent variable and is shown along the 
Y-axis, The four quadrants along the position of жапа y values 
are shown in the Fig. 4.31. 


In any pair (a, b), first coordinate, viz., ‘a’ always refers to 
the X-coordinate which is also known as abscissa and the second 
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Coordinate, vsz., ‘b’ always refers to the Y-coordinate which is also 
known as ordinate. As an illustration the points P(4, 2), Q( —3, 4), 
R(3,—2) and S(—2, —3) are displayed in the following diagram. 


Fig. 4.30 


In the graph on a natural or arithmetic scale, the equal mag- 
nitudes of the values of the variables are represented by equal dis- 
tances along both the axes, though the scales along X-axis and Y- 
axis may be different depending on the nature of the phenomenon 
under consideration, 

4.4.2. General Rules for Graphing. The following guide- 
lines (some of which have already been discussed in $ 4.3.1 for 
diagrammatic representation of data), may be kept in mind for 
drawing effective and accurate graphs : 


1. Neatness. (For details see $ 4.3.1.). 


2, Title and Footnote (For details see $ 4.3.1.). 


3. Structural Framework. The position of the axes should 
be so chosen thatthe graph gives an attractive and proportionate 
get up. It should be kept in mind that for each and every value of 
the independent variable, there is a corresponding value of the 
dependent variable. In drawing the graph it is customary to plot 
the independent variable along the X-axis and the dependent vari- 
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able along the Y-axis. For instance, if the data pertaining to the 
prices of the commodity and the quantity demanded or supplied at 
different prices is to be plotted, then the dependent variable, viz., 
price (which depends on independent forces of supply and demand) 
is taken along Y-axis while the independent variable viz., quantity 
demanded or supplied is taken along X-axis. Similarly, in case of 
time series data, the time factor is taken along X-axis and the phe- 
nomenon which changes with time e.g. population of a country in 
different years, production of a particular commodity for different 
periods, etc, is taken along Y-axis. 


4. Scale. This point has also been discussed in $ 4.3.1. It 
may further be added that the scale along both the axes (X axis 
and Y-axis) should be so chosen that the entire data can be accom- 
modated in the available space without crowding. In this connec- 
tion, it is worthwhile to quote the words of A. L. Bowley. 


“Itis difficult to. lay down rules for the proper choice of 
scales by which the figures should be plotted out. It is only the 
ratio between the horizontal and vertical scales that need to be con- 
sidered. The figure must be sufficiently small for the whole of it to 
be visible at once : if the figure is complicated, related to long series 
of years and varying numbers, minute accuracy must be sacrificed 
to this consideration, Supposing the horizontal scale is decided, the 
vertical scale must be chosen so that the part of the line which 
shows the greatest rate of increase is well inclined to the vertical 
which can be managed by making the scale sufficiently small; and 
on the other hand, all important fluctuations must be clearly visible 
for which the scale may need to be decreased. Any scale which 
satisfies both these conditions will fulfil its purpose.” 


5. False Base Line. The fundamental principle of drawing 
graph is that the vertical scale must start with zero. If the fluctua- 
tions in the values of the dependent variable (to be shown along 
Y-axis) are very small relative to their magnitudes, and if the mini- 
mum of these values is very distant (far greater) from zero, the 
point of origin, then for an effective portrayal of these fluctuations 
the vertical scale is stretched by using false base line. In such a 
situation the vertical scale is broken and the space between the 
origin ‘ʻO’ and the minimum value (ог some convenient value near 
that) of the dependent variable is omitted by drawing two zig-zag 
horizontal lines above the base line. The scale along Y-axis is then 
framed accordingly, False base line technique is quite extensively 
used for magnifying the minor fluctuations in а time series data. It 
also economises space because if such data are graphed without 
using false base line, then the plotted data will lie on the top of the 
graph. This will give a very clumsy look and also result in was- 
tage of space. However, proper care should be taken to interpret 
graphs in which false base line is used, As illustrations, see Exame 
ples 4.34 and 4.35 in § 4.4.4. 
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6. Ratio or Logarithmic Scale. In order to display propor- 
tional or relative changes in the magnitudes, the ratio or logari- 
thmic scale should be used instead of natural or arithmetic scale 
which is used to display absolute changes. Ratio scale is discussed 
in detail in a § 4.4.5, 

7. Line Designs. If more than one variable is to be depicted 
on the same graph, the different graphs so obtained should be dis- 
tinguished from each other by the use of different lines, viz,, dotted 
lines, broken lines, dash-dot lines, thin or thick lines etc., and an 
ne to identify them should be given. [See Examples 4.38 and 

499. 

8. Sources Note and Number. For details see $ 4.3.1. 

9, Index. For details see § 4.3.1. 

10. Simplicity. For details see $ 4.3.1. 

Remark. For detailed discussion on items 1, 2,4, 9 and 10 
see $ 4.3.1 replacing the word ‘diagram’ by ‘graphs’, 


4.4.3. Graphs of Frequency Distributions, The reasons 
and the guiding principles for the graphic reprentation of the fre- 
quency distributions are precisely the same as for the diagrammatic 
and graphic representation of other types of data. The so called 
frequency graphs are designed to reveal clearly the characteristic 
features of a frequency data. Such graphs are more appealing to 
the eye than the tabulated data and are readily perceptible to the 
mind. They facilitate comparative study of two or more frequency 
distributions regarding their shape and pattern. The most com- 
monly used graphs for charting a frequency distribution for the 
general understanding of the details of the data are : 

(i) Histogram. 
(ii) Frequency Polygon. 
(iii) Frequency Curves. 

(iv) “Ogives” or Cumulative Frequency Curves. 

The choice of a particular graph for a given frequency distri- 
bution largely depends on the nature of the frequency distribution, 
viz., discrete or continuous. In the following sections we shall 
discuss them in details, one by one. 

A. HISTOGRAM 

Tt is one of the most popular and commonly used devices for 
chatting continuous frequency distribution, It consists in erecting 
a series of adjacent vertical rectangles on the sections of the hori. 
zontal axis (X-axis), with bases (sections) equal to the width of the 
corresponding class intervals and heights are so taken that the areas 
a the rectangles are equal to the frequencies of the corresponding 
classes. 

Construction of Histogram. The variate values are taken 
along the X-axis and the frequencies along the Y-axis, 

Case (i) Histogram with equal classes. In the case, if classes 
are of equal magnitude throughout, each class interval is drawn 
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on the X-axis by a section or base (of the rectangle) which is equal 
(or proportional) to the magnitude of the class interval. On each 
class interval (as base) erect a rectangle with the height proportional 
to the corresponding frequency of the class. The series of adja- 
cent rectangles (one for each class), so formed gives the histogram 
ofthe frequency distribution and its area represents the total fre- 
quency of the distribution as distributed throughout the different 
classes. The procedure is explained in Examples 4.21 and 4.22, 


Case (ti) Histogram with unequal classes. If all the classes 
are not uniform throughout ; as in case (1) the different classes are 
represented on the X-axis by sections or bases which are equal (or 
proportional) to the magnitudes of the corresponding classes and 
the heights of the corresponding rectangles are to be adjusted so 
that the area of the rectangle is equal tothe frequency of the 
corresponding class. This adjustment can be done by taking the 
height of each rectangle proportional (equal) to the corresponding 
frequency density of each class which is obtained on dividing the 
frequenoy of the class by its magnitude, viz., 


Frequency Density (of a class)— Meise аде cites 


Instead of finding the frequency density a more convenient 
way (from the practical point of view) is to make all the class inter- 
vals equal and then adjust the corresponding frequency by using 
the basic assumption that all the frequencies are distributed uni- 
formaly throughout the class. This consists in taking the lowest class 
interval as standard one with unit length on the X-axis. The 
adjusted frequencies of the different classes are obtained on dividing 
the frequency of the given class by the corresponding Adjustment 
Factor (A.F.) which is given by : 


Magnitude of the class 


3 ny class= Е 
А.Е. for апу Lowest class interval 


Thus, if the magnitude of any class interval is twice (three) 
the lowest class interval, the adjustment factor is 2(3) and the height ` 
of the rectangle which is represented by the adjusted frequency will 
be j (ird) of the corresponding class frequency and so on. This 
is illustrated in Examples 4.23 and 4.24, This adjustment gives 
the rectangles whose areas are equal to the frequencies of the corres- 
ponding classes. 


Remarks 1. Grouped (Not Continuous) Frequency Distribu- 
tion. It should be clearly understood that histogram can be drawn 
only if the frequency distribution is continuous. In case of grouped 
frequency distribution, if classes are not continuous, they should be 
made continuous by changing the class limits into class boundaries 
and then rectangles should be erected on the continuous classes so 
obtained, As an illustration, see Example 4.26. 
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2. Mid-points given. Sometimes, only the mid-values of 
different classes are given. In such a case, the given distribution is 
converted into continuous frequency distribution with exclusive type 
classes by ascertaining the upper and lower limits of the various 
classes under the assumption that the class frequencies are uniformly 
distributed throughout each class. (See Example 4.27) Я 


3. Discrete Frequency Distribution. Histograms, тау some- 
times also be used torepresent discrete frequency distribution by 
regarding the given values of the variable as the mid-points of con- 
tinuous classes and then proceeding as explained in Remark 2 
above. 

4. Difference between Histogram and Bar Diagram: (i) A 
histogram is a two dimensional (area) diagram where both the width 
(base) and the length (height of the rectangle) are important wheres 
as bar diagram is one-dimensional diagram in which only length 
(height of the bar) matters while width is arbitrary. : 


(ii) In a histogram, the bars (rectangles) are adjacent to each 
other whereas in bar diagram proper spacing is given between 
different bars, 


(iii) In a histogram, the class frequencies are represented by 
the area of the rectangles while in a bar diagram they are represent- 
ed by the heights of the corresponding bars, 


5. Open-end classes. Histograms can't be constructed for 
frequency distributions with open end classes unless we assume that 
the magnitude of the first open class is same as that of the succeed- 
ing (second) class and the magnitude of the last open class is same 
as that of the preceding (5.е., last but one) class. 


6. Histogram may be used for the graphic location of the 
value of Mode (See Chapter 5). 


Example 4.21. Draw histogram for the following frequency 
distribution. 


Variable Frequency 
10—20 12 
20—30 30 
30—40 35 
40—50 65 
50—60 45 
60—70 25 
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Solution. 


HISTOGRAM 
65 


FREQUENCY 


0 10 20 30 40 50 60 
VARIABLE 


Fig. 4.31 
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Example 4:22. Represent the following distribution of marks 
of 100 students in the examination by a histogram. 


Marks obtained 


Less than 10 
20 


on 20 
yy re МАО 

ч 50 
ло 
c Tue 80 
2:790 


» ” 


» 


No. of students 


Solution. First of all we shall convert the given cumulative 
frequency distribution into the frequency distribution of marks as 


given in the following table. 


Marks No. of student 
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Histogram is shown in the following diagram. 


25 HISTOGRAM 


NUMBER OF STUDENTS 


Fig. 4.32 


Example 4.23. Represent the following data by meana of a 
histogram. 


Weekly Wages : 10—15 15—20 20—25 25—30 30—40 40—60 60—80 


No. of workers: 7 19 2? 15 12 12 8 
[Delhi U. B. Com. (Hons.) 1977] 


Solution. Since the class intervals are of unequal magni- 
tude, the corresponding frequencies have to be adjusted to obtain 
the so-called ‘frequency density’ so that the area ofthe rectangle 
erected on the class interval is equal to the class frequency. 
We observe that first four classes are of magnitude 5,the class 
30—40 is of magnitude 10 and the last two classes 40—60 and 
60—80 are of magnitude 20. Since 5 is the minimum class interval 
the frequency of the class 30—40 is divided by 2 and the frequencies 
of classes 40—60 and 60—80 are to be divided by 4asshown in the 


following table, 
No. of workers Class Interval 
n 
7 


Height of 


Weekly wages 
Rectangle 


A im 
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30 


HISTOGRAM 
25 
g 
Ш 
Ё 20 
z 
ч 
O 15 
ш 
= 
2 10 
= 
5 
o 10 15 20 25 30 — 40 50 60 70 во 
WEEKLY WAGES (IN RS.) 
Fig. 4.33 


Example 4.24. Two brands of tyres are tested with the follo. 
wing results, 


Life in thousand of miles Number of Tyres 


Brand X Brand Y 


20—25 1 
25—27:5 7 
21-5—30 15 
30—31 10 
31—32 15 
32—33 17 
33—34 13 
34—35 9 
35—37:5 8 
37:5—40 2 
40—45 3 


Total 100 100 


Draw a histogram for each frequency distribution. 
2 [Delhi U. В. Com. (Hons.) 1971] 
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Solution, Since the class intervals are of unequal magnitude 
(as discussed in the previous Example 4.23, the frequencies have 
tobe adjusted. Since the minimum magnitude of given classes is 1 (see 
classes 30—31, 31—32, ...,34— 935), the corresponding frequencies of 
the classes /25—27.5, 27.5—30, 35—37.5 and 37.5 to 40, each of 
which is of magnitude 2.5, are to be divided by 2.5 £.e., 5/2, while the 
corresponding frequencies of the classes 20—25 and 40—45, each of 
which is of magnitude 5 are to be divided by 5. The adjusted fre- 
quencies, (which willserve as the heights of the reactangles on the 
class intervals) are obtained in the following table. 


маша рех | Тыа 


of class Adjusted Adjusted 
interval frequency frequency 
20—25 


25—2T5 H 7 Ы 4 =z =16 


Life in thou- 
sands of 
miles 


27:5—30 


30—31 
31—32 
32—33 
33—34 
34—35 


35—37:5 
'31:5—40 
40—45 


HISTOGRAM 
40 40 
BRAND X TYRES BRAND Y TYRES. 


FREQUENCY 


0 20 25 30 35 40 45 0 20 25 30 35 40 45 
LIFE (THOUSANDS OF MILES) 
Fig. 4.34 
B. FREQUENCY POLYGON 
Frequency polygon is another device of graphic presentation 
of a frequency distribution (continuous, grouped or discrete). 
_In case of discrete frequency distribution, frequency polygon is 
obtained on plotting the frequencies on the vertical axis (Y-axis) ' 
against the corresponding values of the variable on the horizontal 
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axis (X-axis) and joining the points so obtained by straight lines. [As 
an illustration see example 4.25]. 

In case of grouped or continuous frequency distribution, fre- 

quency polygon may be drawn in two ways. : 

. Case (i) From Histogram. First draw the histogram of the 
given frequency distribution as explained in § 4.4.3 (A). Now join 
the mid-points ofthe tops (upper horizontal sides) of the adjacent 
rectangles of the histogram by straight line graph. The figure so ob- 
tained is called а frequency polygon (Polygon is a figure with more 
than four sides). It may be noted that when the frequency polygon 
is constructed as explained above it cuts off a triangular strip (which 
lies outside the frequency polygon) from each rectangle of the his- 
togram, But, at the same time, another triangular strip of the same 
area which is outside the histogram is included under the polygon, 
as shown by shaded area in the diagram of Example 4.26, [This, 
however, is not true in the case of unequal class intervals]. In order 
that the area ofthe frequency polygon is equal to the area of the 
corresponding histogram of the frequency distribution it is necessary 
to close the polygon at both ends by extending them to the base 
line such that it meets the X-axis at the mid-points of two hypothe- 
tical classes, viz., the class before the first class and the class after 
the last class, at both the ends each with frequency zero [See Exam- 
ples 4.26 to 4.27]. 

Case (ii) Without Constructing Histogram. Frequency 
polygon of a grouped or continuous frequency distribution is a 
straight line graph which can also be constructed directly without 
drawing the histogram. This consists in plotting the frequencies of 
different classes (along Y-axis) against the mid-values of the corres- 
ponding classes (along X-axis). The points so obtained are joined 
by straight lines to obtain the frequency polygon. As in case (#), 
the frequency polygon so obtained should be extended to the base 
at both ends by joining the extreme points (first and last point) to 
the mid-points of the two hypothetical classes (before the first class 
and after the last class) assumed to have zero frequencies. The 
figure of the frequency polygon so obtained would be exactly same 
asin case ($) except for the histogra m, 


This point can be elaborated mathematically as follows. Let 
24, 23..., 2л be the mid-values of n classes with frequencies fis Jain 
respectively. We plot the points (z;, fi), (25. fa );-.-, (an, fn) on the 
co-ordinate axes, taking mid-values along X-axis and frequencies 
along Y-axis and join them by straight lines. The first point (f) 
is joined to the point (20,0) and the last point (an, fn) to the point 
(25.44, 0) by straight lines and the required frequency polygon is ob- 
tained, 
Remarks 1. Frequency polygon can be drawn directly with 
out the histogram (as explained above) if only the mid-points of the 
classes are given ; without forming the continuous frequency distri- 
bution which is desirable in the case of histogram. 
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2. Frequency Polygon Үз. Histogram : (i) Histogram is a two 
dimensional figure, viz., a collection of adjacent rectangles whereas 
frequency polygon is a line graph. 

(ii) Frequency polygon can be used more effectively for com- 
parative study of two or more frequency distributions because fre- 
quency polygons of different distributions can be drawn on the same 
single graph. This is not possible in the case of histogram where we 
need separate histograms for each of the frequency distributions. 
However, for studying the relationship of the individual class fre- 
quencies to the total frequency, histogram gives a better picture and 
is accordingly preferred to the frequency polygon. 


(iit) In the construction of frequency polygon we come across 
same difficulties as in the construction of histograms, viz., 


(a) It cannct be constructed for frequency distributions with 
open end classes and 

(b) Suitable adjustments, asin the case of histogram are re- 
quired for frequency distributions with unequal classes, 

(iv) Unlike histogram, frequency polygon is a continuous 
curve and therefore possesses all the distinct advantages of graphic 
representation, viz., it may be used to determine the slope, rate of 
change, estimates (interpolation and extrapolation), etc., wherever 
admissible. 

Example 4.25 The following data show the number of acci- 
dents sustained by 313 drivers of a public utility company over a 
period of 5 years. 

(Contd. on page 173) 


FREQUENCY POLYGON OF 
NUMBER OF ACCIDENTS 


FREQUENCY 
POLYGON 


NUMBER OF DRIVERS 


NUMBER OF ACCIDENTS 


Fig. 4.35 
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Number of 
accidents 0 [ac x CC CeO, КЫМ. 9 10 11 
Number of 


drivera 81 44 68 41 26 20 19 7 58 4 3 2 
Draw the frequency polygon. 
Solution. See diagram on page 172. 


Example 4.26. The following table gives the frequency distri- 
bution of the weekly wages of 100 workers in a factory. 


Weekly Wages Number of Workers 


Draw the histogram and frequency polygon of the distribution. 


Solution. Since all the classes are of equal magnitude i.e,, 5, 
for the construction of the histogram, the heights of the rectangles 
to be erected on the classes will be proportional to their respective 
frequencies. However, since the classes are not continuous, the 
given distribution is to be converted into a continuous frequency 
distribution, with exclusive type classes before erecting the rect- 
angles, as given in the following table. 


Weekly Wages Number of Workers (Г) 


19:5—24:5 4 
24:5—29:5 5 
295—345 12 
34'5—39:5 23 
395—445 3 
44:5—49:5 10 
49:5—54'5 8 
54:5—59:5 5 
59°5—64°5 2 
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As usual, frequency polygon is obtained from histogram by 
joining the mid points of the rectangles by straight lines, and exten- 
ded both ways to. the classes 14.5—19.5 and 64.5 —69.5 on the 
X-axis, as shown in the following diagram, 


HISTOGRAM AND FREQUENCY POLYGON 
3t 


35 


30 


NUMBER OF WORKERS 


19.5 245 295 345 39.5 445 495 545 59.5 64.5 69.5 
WEEKLY WAGES 


\ Fig. 4.36 


Remark. It may be pointed out that frequency polygon 
can be drawn straight way by plotting the frequencies against the 
mid-points of the correspodning classes without converting the given 
distribution into a continuous one and joining these points by 
straight lines, 


Example 4.27. Draw the histogram and frequency polygon for 
the following frequency distribution. 


Mid-value of class interval 


8 7 
pr 10 
12:5 20 
175 13 
22:5 17 
27:5 10 
32:5 14 
37:5 9 
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Solution. Since we are given the mid-values of the class- 
intervals, for the construction of histogram, the distribution is to be 
transformed into continuous class intervals each of magnitude 5, 
(under the assumption that the frequenciés are uniformly distributed 
throughout the class intervals), as given in the following table. 


Frequency 


The histogram and frequency polygon are shown in the follow- 
ing diagram. 


25 HISTOGRAM AND FREQUENCY POLYGON 


20 
HISTOGRAM 


FREQUENCY 
POLYGON 


FREQUENCY 


CLASSES 
Fig. 4.37 


be 
k. It may be noted that frequency polygon can 

dui ERN without converting the given distribution into classes. 
The frequencies are plotted against the corresponding mid-points 


(given) and joined by straight lines. 
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С. FREQUENCY CURVE 


A frequency curve is a smooth free hand curve drawn through 
the vertices of a frequency polygon. The object of smoothing of the 
frequency polygon is to eliminate, as far as possible, the random or 


frequency polygon as the number of observations (total frequency) 


becomes very large and the class intervals are made smaller and 
smaller. 7, y 4 


Remarks 1, Smoothing should be done ver 
regular as possible and sudden and sharp turns 


smoothing can be conveniently done b 
give rise to symmetrical curves, However, 


done effectively as such data usually give rise t 
cal) curves, [For details se: § 44:3. С (b)page 178] 
desirable to attempt a frequency curve if we h 
believe that the frequency distribution under study is fairly regular, 
It is futile to attempt a frequency curve for an irregular distribution. 
In general, frequency curves should be attempted 


*. In fact, it is 
ave sufficient reasons to 


(i) for frequency distributions based on the samples, and 


(ti) when the distribution is continuous, 


2. We have already seen that a fre. 
drawn with or without a histogram. Ho 
frequency curve for a given frequency distribution, it is desirable io 
proceed in a logical Sequence, viz, first draw a histogram, then 
a frequency polygon and finaily a frequency curve, because in the 
absence of a histogram the smoothing of the frequency polygon 
cannot be done Properly. As discussed in frequency polygon, the 
frequency curve shouid also be extended to the base on both sides of 


the histogram so that the area ucder the frequency curve represents 
the total frequency of the distribution, 


quency polygon can be 
wever, to obtain an ideal 


3. А frequency curve can be used with 
lation [i.e., estimating the frequencies for giv 
able or in a given interval (within the give 
provided it rises gradually to the highes 
Or less in the same manner, It can 


advantage for interpo- 
en value of the vari« 
n range of the variabl)], 
t point and then falls more 
also be used to determine the 


— n 


тиу чору ир” 
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rates of increase or decrese in the frequencies. It also enables us 


[есап Чек Бо the Skewness and Kurtosis of the distribution 


асна ЭРИ 4°28. Draw а frequency curve for the following 


Age (Yra) : 17-19 19.21 21.23 23.25 25.27 27-29 29.81 
No. of Students : 7 13 24 30 22 15 5 
Solution, 


FREQUENCY CURVE 


FREQUENCY CURVE 


HISTOGRAM 


NUMBER OF STUDENTS 
= 


AGE (YEARS) 


Fig. 4.38 

Types of Frequency Curves. Though different types of dats 
‘may give rise to a variety of frequency curves, we shall discuss 
below only some of the important curves which, in general, describe 
most of the data observed in practice, viz., the data relating to natu- 
ral, social, economic and business phenomena. 

(a) Curves of Symmetrical Distributions. In a symme- 
trical distribution, the class frequencies first rise steadily, reach a 
maximum and then diminish in the same identical manner, 

If a curve is folded symmetrically abcut a vertical line (corres 
ponding to the maximum frequency), so that the two halves of the 
figures coincide, it is called a symmetrical curve. It has a single 
smooth hump in the middle and tapers off gradually at either end 
and is bell shaped. 


The following hypothetical distribution of marks in a test will 
give a symmetrical frequency distribution. 
Marks : 0.10 10.20 20-30 30.40 40.50 50.60 60-70 70-80 80.90 
Frequency: 40 70 120 160 180 160 120 70 — 40 


178 Business Statistics 


If the data are presented graphically, we shall obtain a fre- 
quency curve which is symmetrical. 


The most commonly and widely used symmetrical curve in 
Statistics is the Normal frequency curve which is given below. (For 
details, see Chapter 14 on Theoretical Probability Distributions). 


NORMAL PROBABILITY CURVE 


X=MEAN 


Fig. 4.39 


Normal curve, generally describes the data relating to natural 
phenomenon like tossing of a coin, throwing of а dice etc. Most of 
the data relating to psychological and educational statistics also give 
rise to normal curve. However, the data relating to social, business 
and economic phenomena do not conform to normal curve. They 
always give moderately asymmetrical (slightly skewed) curves dis- 
cussed below. 


(b) Moderately Asymmetrical (Skewed) Frequency 
Curves. A frequency curve is said to be skewed (asymmetrical) if it 
is not symmetrical. Moderately asymmetrical curves are commonly 
observed in social, economic and business phenomena, Such curves 
are stretched more to one side than to the other. If the curve is 
stretched more to the right (i.e, it has a longer tail towards the 
right), it is said to be positively skewed and if it is streched more to 
the left (i,¢., has a longer tail towards the left), it is said to be nega- 
tively skewed. Thus, in a positively skewed distribution, most of 
the frequencies are associated with smaller values of the variable 
and in a negatively skewed distribution most of the frequencies are 
associated with larger values of the variable. The following figures 
show positively skewed and negatively skewed distributions. 


POSITIVELY SKEWED DISTRIBUTION NEGATIVELY SKEWED DISTRIBUTION 
MODE MODE 


Fig. 4.40 
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, ,«) Extremely Asymmetrical or J-Shaped Curves. The 
distributions in which the value of the variable corresponding to the 
maximum frequency is at one of the ranges (and not in the middle 
asin the case of symmetrical distributions), give rise to highly 
skewed curves. When plotted, they give a J-shaped or inverted 
J-shaped curve and accordingly such curves are also called J-shaped 
curves. Ina J-shaped curve, the distribution starts with low fre- 
quencies in the lower classes and then frequencies increase steadily 
as the variable value increases and finally the maximum frequency 
is attained in the last class thus exhibiting a peak at the extreme 
right end of the distribution. Such curves are not regular curves 
but become unavoidable in certain situations. For example, the 
distribution of mortality (death) rates (along Y-axis) w.r.t. age (alon 
X-axis) after ignoring the accidental deaths ; or the distribution of 
persons travelling in local state buses, (e,g., DTC in Delhi or BEST 
in Bombay) w.r.t. time from morning hours, say, 7 A.M. to peak 
traffic hours, say, 10 A.M. will give rise toa J-shaped distribution. 
Similarly in an inverted J.shaped curve the frequency decreases 
continuously with the increase in the variate values, the maximum 
frequency being attained in the beginning of the distribution, For 
example, the distribution of the quantity demanded w.r.t. the price ; 
or the number of depositors w.r.t. their saving in a bank, or the 
number of persons w.r.t, their wages or incomes in а city, will give 
a reverse J-shaped curve. 


J - SHAPED CURVE INVERTED J-SHAPED CURVE 

о 
ш 
ш 2 
к < 
= i 
E G] 
Е > 
+ Е 
E z 
3 5 

> 8 — 

—— 
AGE PRICE 

Fig. 4.41 


(d) U-Curve. The frequency distributions in which the maxis 
mum frequency occurs at the extremes (i.e. both ends) of the range 
and the frequency keeps on falling symmetrically (about the. 
middle), the minimum frequency being attained at the centre give 
rise to a U-shaped curve. In this type of distribution, most of 
values are associated with the values of the variable at the extremes 
i.e., with smaller and larger values whereas smaller frequencies are 
associated with the intermediate values, the central value having 
the minimum frequency. Such distributions are generally observed 
in the behaviour of total costs where the curve initially falls steadily 
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and after attaining the optimum level (in the middle), it starts 
rising steadily again. As another illustration, the distribution of 
Persons travelling in local state buses between morning and 
evening peak hours will give, more or less, a U-shaped curve shown 
below. 

U - SHAPED CURVE 


NO. OF PERSONS 


o MORNING EVENING TIME 
PEAK HOURS PEAK HOURS 


Fig. 4.42 


(e) Mixed Curves. In the curves discussed so far, we have 
seen that the highest concentration of the values lies at the centre 
(symmetrical curve), or near around the centre (moderately asym- 
metrical curve), or at the extremes (J-shaped and U-shaped curves). 
But sometimes, though very rarely, we come across certain distri- 
butions in which maximum frequency is attained at two or more 


< 
< 


TRI - MODAL CURVE 


NAA 


x о VARIABLE 


BI - MODAL CURVE 


FREQUENCY 
FREQUENCY 


o VARIABLE 


Fig. 4.43 


Such curves are obtained in a distribution where as the value 
of the variable increases, the frequencies increase and decrease, then 
again increase and decrease in an irregular manner ; the phenomenon 
may be repeated twice or thrice as shown in the above diagrams 
or even more than that. The distributions with two humps are called 
bi-modal distributions and those with three humps are called 
tri- modal distributions while those with more than three humps are 
termed as multi-modal distributions. Such distributions arc rarely 
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observed in practice and should be avoid d аз far as possible 
because they cannot be usefully emp'oyed for the computation of 
various statistical measures and for statistical analysis. 


D. OGIVE OR CUMULATIVE FREQUENCY CURVE 


Ogive, pronounced as ojive, is a graphic Presentation of the 
cumulative frequency (c.f.) distribution [See Chapter 3] of conti- 
nuous variable. It consists in plotting the c.f. (along the Y-axis) 
against the class boundaries (along X-axis). Since there are two 
types of cumulative frequency distributions viz., ‘less than’ c.f. 
and ‘more than’ c.f. we have accordingly two types of ogives, viz., 


(i) ‘Less than’ ogive. 
(4) ‘More than’ ogive. 

‘Less Than’ Ogive. This consists in plotting the ‘less than’ 
cumulative frequencies against the upper class boundaries of the 
respective classes. The points so obtained are joined by a smooth 
free hand curve to give ‘less than’ ogive. Obviously, ‘less than’ ogive 
is an increasing curve, sloping upwards from left to right and has 
the shape of an elongated $. 


‘More Than’ Ogive. Similarly, in ‘more than’ ogive, the 
‘more than’ cumulative: frequencies are plotted against the lower 
class boundaries of the respective classes. The points so obtained 
are joined by a smooth free hand curve to give ‘more than’ ogive- 
“Моге than’ ogive is a decreasing curve and slopes downwards from 
left to right and has the shape of an elongated S, upside down. 


Remarks 1. We may draw both the ‘less than’ ogive and 
‘more than’ ogive on the same graph. If done so, they intersect 
at a point. The foot of the perpendicular from their point of inter- 
section on the X-axis gives the value of median. [See Examples 4.30 
and 4.31 J 

2. Ogives are particularly useful for graphic computation of 
partition values, viz., Median, Quartiles. Deciles, Percentiles, etc., 
[For details, see Chapter 5]. They can also be used _to determine 
graphically the number ог proportion of observations below ог 
above a given value of the variable or lying between certain interval 
of the values of the variable. 


3. Ogives can be used with advantage over frequency curves 
for comparative study of two ог more distributions because like 
frequency curves, for each of the distributions different ogives can 
be constructed on the same graph and they are generally less over 
lapping than the corresponding frequency curves, 

4. Ifthe class frequencies are large, they can be expressed 
as percentages of the total frequency. The graph of the cumulative 


percentage frequency is called ‘percentile curve’. 
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Example 4.29 Draw a less than cumulative freguency curve 


for the following data and find fromthe graph the value of seventh 
decile, 


Monthly No, of Monthly No. of 

income workers income workers 
0—100 12 500—600 20 
100—200 28 600—700 20 
200—800 35 700—800 17 
400—400 65 800—900 13 
400— 500 30 900— 1000 10 


[Bombay U. В. Сот. April 1983] 


Solution, 


‘LESS THAN’ CUMULATIVE FREQUENCY TABLE 


Less than cumulative frequency curve is obtained on plotting 
the ‘less than’ c.f. against the upper limit of the corresponding class 
and joining the points so obtained by a smooth free hand curve as 
shown below : 
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250 OGIVE 


200 


NO. OF WORKERS =} 
v 


545 
0 100 200 300 400 500 600 700 800 900 1000 
MONTHLY INCOME ——* 


Fig. 4.44 
To obtain the value ofseventh decile from the graph, at fre- 


x =з x250=175, draw a line parallel to the X-axis 

meeting the ‘less than’ c.f. curve at point P. From P draw PM per- 

eres to X-axis meeting it at M. Then the value of seventh 
ecile is 


quency 


D,2OM-545 [From the graph] 
Example 4°30. The following table gives the distribution of 
monthly income of 600 families in a certain cily. 


Monthly Income (Rs.) No. of Families 


Below 75 


sages 


450 and over 


Draw a ‘less than’ and a ‘more than’ ogive curve for the above 
data on the same graph and from these read the median income. 
[Delhi Uni. В. Com. (Hons.) 1974) 


Solution. For drawing the ‘less than’ and ‘more than’ ogive 
we convert the given distribution into ‘less than’ and ‘more than’ 
cumulative frequencies (c. f.) as given in the following table. 
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Monthly Income No. of Families More than 
(Rs.) CK) 5 


Below 75 


75—150 540 
150—225 370 
225—300 170 
300—375 110 
375—450 


450 апа оуег 


From the point of intersection of the 
perpendicular to the X-axis (monthly incomes), 
Co-ordinate) of the point where t 
gives the value of median, 


se two ogives, draw a line 
The abscissa (x- 
his perpendicular meets the X-axis 


The ‘more than’ and ‘less than’ ogiy, 


es and the value of median 
are shown in the Fig 4.45, 


LESS THAN AND ‘MORE THAN OGIVE 


600 


500 


LESS THAN OGIVE 


400 


300 


200 


NUMBER ОЕ FAMILIES. 


MORE THAN OGIVE 
100 


250 300 


MEDIAN = 176 (APPROX) 
MONTHLY INCOME (Rs ) 


Fio 4 ле 
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К Example 4.31. Convert the following distribution inio ‘more 
than’ frequency distribution. 


a or АИА С 


Weekly Wages less than No. of Workers 
(Rs.) 
20 41 
40 151192, 
60 156 
80 194 
100 201 


——————————— 


For the data given above, draw ‘less than’ and ‘more than’ ogive 
and hence find the value of median. (Delhi Uni, В. Сот. 1979) 


Solution. The above data are in the form of a ‘less than’ 
cumulative frequency distribution. The frequency distribution toge- 
ther with ‘less than’ and ‘more than’, cumulative frequencies is given 
in the table on page 186. 


Cumulative frequency (c. f.) 


Weekly wages | No. of workers 


(in Rs.) 


Less than More than 


0—20 41 160+ 41—201 
20—40 92— 41=51 51+109=160 
40—60 156— 92=64 64+ 45=109 
60—80 194—156=38 38+ T= 45 
80—100 201—194— 7 7 


"Less than’ and ‘More than’ ogive are shown in the following 
diagram. 


186 


From the above diagram, 
Rs. 42°70 approximately. 


NUMBER OF WORKERS 
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LESS THAN OGIVE 


i MORE THAN OGIVE 
60 80 


MEDIAN = 427 
WEEKLY WAGES. 


Fig. 4.46 


the value of median wage is 


Example 432 Draw a percentile curve for the following dis. 
tribution of marks obtained by 700 atudents at an examination. 
Marks No. of Students Marks No. of Students 
0—10 9 50—59 102 
10—19 42 60—69 71 
20—29 61 70—79 23 
80—39 140 80—89 2 
40—49 250 


Find from the graph (3) the marks at the 20th percentile, and (ii) 
the percentile equivalent to a mark of 65. 


(Delhi U. B. Com. 1978) 
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Solution. A percentile curve is obtained on expressing the 
less than’ cumulative frequencies as percentage of the total freque- 
ncy and then plotting these cumulative percentage frequencies (P) 
against the upper limit of the corresponding class boundaries (z). 
These points are then joined by a smooth free hand curve. 


COMPUTATION OF CUMULATIVE PERCENTAGE 
FREQUENCY DISTRIBUTION 


Marks Frequency *Less thea’ Percentage ‘less than’ 
(f) c.f. c.f. (P) 


TONO Oy Ode 
O-5wuo0o0nuu 


Severn. 


79:5—89:5 


‘Less than’ c.f. 


Total frequency x100 


Percentage ‘less than’ c.f. = 


sen ETE 
то О 7 


PERCENTAGE CURVE 


PERCENTAGE LESS THAN C. F. ———_}>-y 
ч 
3 


0 9.5 195 29.5 39.5 49.5 59.5 69.5 79.589.599.5 X 
MARKS 


Fig. 4.47 
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(i) To find marks (2) Corresponding to the 20th percentile, 
at P—20, draw a line parallel to X-azis, meeting the 
percentile curve at А. Draw AM perpendicular to X.azis, 
meeting X-axis at M. Then ОМ=31°5, gives the marks 
at the 20th percentile. 


(ii) To find percentile equivalent to mark z—65, at z—65 
draw perpendicular to X-azís meeting the percentile 
curve at B. From B draw a line parallel to X-axis meet- 
ing the Y-azisat N. Then ON=92, is the percentile 
equivalent to score of 65. 


4.44. Graphs of time Series or Historigrams. A time 
series is an arrangement of statistical data in a chronological order 
ic , with respect to occurrence of time. The time period may be a 
Year, quarter, month, week, days, hours and so on. Most of the 
series relating to economic and busiuess data are time series such as 
population of a Country, money in circulation, bank deposits and 
clearings, production and price ef commodities, sales and profits of 
a departmental store, imports and exports of a country etc. Thus in 
a time series data there are two variables ; опе of them, the inde- 
Pendent variable being time and the other (dependent) variable 
being the phenomenon under study. 


The time series data are represented geometrically by means of 
Lime Series Graph which is also known as Historigram. The inde- 
pendent variable viz,, time is taken along thé X-axis and the depen- 
dent variable is taken along the Y-axis. The various points so obtai- 
ned are joined by straight lines to get the time series graph. If the 
actual time series data are graphed, the historigram is called Ab- 
solute Historigram, However, the graph obtained on plotting the 
index number of the given values is called Index Historigram and 
it depicts the percentage changes in the values of the phenomenon 
as compared to some fixed base period. Historigrams are extensively 
used in practice. They are easy to draw and understand and do 
not require much skill and expertise to construct and interpret 
them, 


Remark. Time series graphs сап be drawn on a natural 
(arithmetic scale) or on a ratio (semi-logarithmic or logarithmic) 
scale, the former reflecting the absolute changes from one period to 
another and the Jatter depicting the relative changes or rates of 
change, In the following sections we shall study the time series 
бары оп а natural scale, Ratio scale graphs are discussed in 

4.4.5. 


The various types of time series graphs are : 
(i) Horizontal Line Graphs or Historigrams 
(ii) Silhouette or Net Balance Graphs 
(iit) Range or Variation Graphs 
(iv) Components or Band Graphs. 
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Now we shall discuss them briefly, one by one. 
A. HORIZONTAL LINE GRAPHS OR HISTORIGRAMS 
In such a graph only one variable is to be represented graphi- 
cally. As already explained, the desired graph (historigram) is 
obtained on plotting the time variable along the X-axis and the 
other variable viz., the magnitudes of the phenomenon under con- 
sideration along the Y-axis on a suitable scale and joining the points 
so obtained by straight lines. An illustration is given below. 
Example 4.33. Draw the graph of the following : 
Year 1920 1921 1922 1923 1924 1925 1926 1927 
Yield 
(in million tons) 12.8 13.9 12.8 13.9 13.4 65 29 148 
Ш.С. Р.А. (Intermediate) December 1977] 
Solution. Taking the scale along X-axis as 1 cm=1 year and 
along Y-axis as 1 cm —2 million tons, the required graph is as 
shown below. 


YIELD (IN MILLION TONS) FOR DIFFERENT YEARS 


YIELD (IN MILLION TON) 


© є 
Ө x ч 2 X 2 8 & 
& & X 
© © © © è 2 E = 
YEARS 


Fig. 4.48 
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Е 


Me SE BASE LINE, А explained [HORT (ш 5), 
if the fluctuations in the velim. Ош Уаш (to be shown along 
the Y-axis) are small as Compared to their magnitudes and if the 
minimum value of th 


€ variable is very distant from origin ñe., zero 
then the technique of false b; 


ase line is used to highlight these fluctu- 
ations, Illustration, 


5 are given in Examples 4.34 and 4.35, 
HISTORIGRAM. 
The time Series data relating to two or more related. variables 
i.e, phenomena measured in the same unit and belonging to the 
Same time period can be displayed together in the same graph using 
the same scales for all the Variables along the vertical axis and the 
same scale for time along X-axis for each variable. The method for 
drawing such graphs is same as that of historigram for one variable, 
hus we shall get a number of Curves, one for each variable. They 
should be distinguished from each other by the use of different types 
of lines viz., thin and thick lines, dotted line, dash lines, dash-dot 
зла an index to this effect should be given for proper 
identification of the curves, The following illustration will clarify 
the point, 


Example 434. T) llowing table gives the index numbers of 
industrial Production for mene Жей 


INDEX NUMBER OF INDUSTRIAL PRODUCTION 


Base : 1970= 100 
Tiem 197! төзә 


1973 1974 1975 1976 
Ee ae, ue ар "| 


07.0 1131 107.6 1026 1167 138.9 
Tron and steel 100.6 7120 961 1002 19.3 1450 
General Index 1042 


Ans. As usua] 
index numbers 


» We take time (years) along X.axis and the 
A 
axis) at 95, the g 


along Y-axis, Using false base line (for vertical 
raph is shown on page 191, 
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INDEX NUMBER OF INDUSTRIAL PRODUCTION 


150 m maea 
BASE: 1970 =100 


—.—-+— CEMENT /. 
140 4 
/ 
———— IRON & STEEL / 
/ 
= GENERAL INDEX j4 $ 


INDEX NUMBERS 


1971 1972 1973 1974 1975 1976 


YEARS 
Fig. 4.49 


Remarks 1. The technique of drawing two or more historis 
grams on the same graph facilitates comparisons between the relata 
ed phenomena. However, its use should not be recommended if the 
number of variable: is large, say, more than 4. In such a case the 
different line graphs which may intersect each other become quite 
confusing and it becomes quite difficult to understand and interpret 
them. 

2. The graph obtained om plotting the index number is 
known as iudez historigram and it represents the relative changes in 
the values of the variables under consideration. Two or more 
variable index historigrams on the same graph obviously facilitate 
comparisons. However, in order to arrive at any valid conclusions, 
the index numbers for all the variables should be computed with 
respect to the same base period. 

3. Graphs of two variables measured in different units, The 
time series data relating to two related phenomena which are me- 
asured in different units e.g , imports (quantity in million tons) and 
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imports (values in crores Rupees) but pertaining to the same time 


each variable Proportional to its average value t.e., the average 
value ofeach variable is kept in or near about the middle of the 
vertical scale in the graph and the scale for each is selected accor: 
dingly. We explain this point by the following illustration. 
Example 4.35. Plot a graph to represent the following data in a 
suitable manner, 
Year 1920 1921 1922 1923 1924 1995 1926 1927 
Imports ("000 mds) 400 450 560 620 580 460 500 540 
Imports (000 Rs.) 220 235 385 420 420 380 360 400 
VOLUME AND VALUE OF IMPORTS 
(1920—1927) 
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Solution, The time variable (Year) is recorded along the 
X-axis with scale 1 cm=1 year and the variate values, imports 
(volume in mds) and imports (value in Rs.) are recorded along the 
Y-axis with scales : 


1 cm=20,060 mds (for imports-quantity) 
1 cm=20,000 Rs. (for imports-value) 


and false baseline is selected at 400,000 mds (for quantity) and. . 
Rs. 220,000 (for value), The graph is shown on pagel92. 


B. SILHOUETTE OR NET BALANCE GRAPH 


This graph is specially used to highlight the difference or the 
net balance between the values of two variables along the vertical 
axis e.g., the difference between imports and exports of a country 
in different years, sales and purchase ofa business concern for 
different periods, the income and expenditure of a family in diffe: 
rent months and so on. This can be done in any one of the follow: 
ing two ways : 


Method 1, Obtain the net balance viz., the difference bet« 
ween two sets of the values of the phenomena (variables) for diffe- 
rent periods, Some of these differences may be negative also. Now, 
in addition to the two historigrams, one for each variable, draw 
a third historigram for the net balance on the same graph. A porz 
tion of this graph, (corresponding to the negative values of the net 
balance) will be below the X-axis. 


Method 2. Draw two historigrams, one for each phenome- 
non (variable), on the same graph. The net balance between the 
variables is depicted by proper filling or shading of the space bet- 
ween the two historigrams, depicting clearly the positive and nega- 
tive balance. 


Both these methods are explained in the following illustration. 
Example 4.36. India’s overall balance of payment situation 
(Billions of Rupees) is given below : 
Years : 1970.71 1971-79 1979-78 1973-74 1974.75 


Credits — 18.9 20.9 24.2 46.1 40.7 
Debits 22.2 24.9 26.7 83.0 47.2 
Balance 

(Credit— 

Debit) 3.3 —4.0 —2.6 13.1 —6 


Represent the above data on the same graph. 


.. Solution, The above data can best be represented by the 
Silhouette or Net Balance Graph. The graphs obtained on using 
both the methods are given on pagel94, 
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INDIAS OVERALL BALANCE OF PAYMENTS 


50 


40 


30 


20 


0 
0-. -о-· Y 
10 Raat. 2 ee 
cor җә o x 
ES B ЙС 
YEARS 
Fig. 4.51 


CREDITS AND DEBITS DURING 1970 - 1975 
(ALONG WITH BALANCE OF PAYMENTS) 
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С. RANGE GRAPH 


By range we mean the deviation, ïe., difference between the 
two extreme values viz., the maximum and minimum values of the 
variable under consideration, 


Range graph, also sometimes known as zone graph, is used to 
depict and emphasize the Tange of variation of a phenomenon for 
eadi period. For instance, for highlighting the range of variation 
of: 


(i) the temperature on different days, 


(ii) the blood pressure readings of an individual on different 
days, 


Method 1. For each time period, plot the maximum and 
minimum values of the variable and join them by straight lines to 


Example 4 37, The following are the share price quotations 
of a firm for five consecutive weeks. Present the data by an appro- 
priate diagram. 


Week High Low 
T 102 100 
2 103 101 
3 107 103 
4 106 105 
5 105 104 


[Calicut, Uni. B. Сот, 1975 ; Himachal Pradesh Uni. M.B.A, 1979] 


Solution, Since the maximum and minimum price quotations 
of a firm for 5 consecutive weeks are given, the most appropriate 
graph for it is the zone or the range graph. 


Taking weeks along X-axis and share Price quotations along 
Y-axis and using the false base line at 100 for the vertical scale, the 
range chart as obtained by method 1 is drawn in Fig 4.53. 
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RANGE CHART FOR SHARE PRICE QUOTATIONS OF A FIRM 
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range chart may also be d vn as follows: (c. f 


RANGE CHART FOR SHARE PRICE QUOTATIONS OF A FIRM 
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D. BAND GRAPH 


Like sub-divided bar diagram or pie diagram, the band graph, 
also known as component part line chart, is a line graph used to dis. 
play the total value or the magnitude of a variable and its break up 
into different components for each period, The construction of 
such a chart which is used only for time series is quite simple and 
involves the following steps : 


. . (i) For each period arrange the break up of the value of the 
variable into various components in the same order. 


($$) Draw historigram for the first component, 


(iii) Over this historigram draw another historigram for the 
2nd component. This is done by drawing the 2nd historigram for 
the cumulative totals of the first two components, 


(iv) Over the 2nd historigram draw another historigram for the 
third component. This is done by drawing the historigram for the 
cumulative totals of the first three components, This technique of 
drawing historigrams, one over the other is continued till all the 
components are exhausted. The last historigram, thus corresponds 
to the total value of the variable, 


The space between different historigrams in the form of differ- 
ent bands or belts, one for each component, is prominently displayed 
by different "types of lines viz, ; dash lines, dot lines, dash-dot lines 
etc, This chart is specially useful to display the division of the total 
Costs, total sales, total production etc., into various component parts 


for different periods, 


Remark. Just like percentage bar diagram or percentage rect- 
angular diagram, band chart can also be used for time series where 
data are expressed in percentage form. In such a situation, the 
total value of the variable for each period is taken as 100 and bands 
will depict the percentage that different components bear to the 


total. 


Example 4 38. The following table gives the cost of production 
(in arbitrary units) of a factory in biennial averages : 


Material 20 

GT 15 
erhead 

= 44 


Total 


Represent the above data by a band graph. 
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Solution : 


COST OF PRODUCTION OF FACTORY 
(1968 - 1978) 
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4.4,5. Semi-logarithmic Line Graphs or Ratio Charts. In 
the graphs discussed so far, we have used arithmetic, t.e., natural 
scale in which equal distances represent equal absolute magnitudes 
on both the axes. Such graphs can be used with advantage if we 
are interested in displaying the absolute changes in the value of a 
phenomenon and the variations in the magnitudes are such that they 
can be plotted in the available space in the graph paper. But quite 
often, particularly in the case of phenomena pertaining to growth 
like population, production, sales, profits etc., the increase or de- 
crease in the value of the variable is very rapid. In such a situation 
we are primarily interested to study the relative changes rather than 
the absolute changes in the value of the phenomenon and the arith- 
metic scale is not of much use. In such cases we use semi-logarith- 
mic or logarithmic or ratio scale which is basically used to highlight 
or emphasize relative or proportionate or percentage changes in the 
values of a phenomenon over different periods of time. 


Since 
log (=) = log a — log b, 
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on a logarithmic scale, equal distances will represent equal propor- . 
tionate changes. 

There are two ways of using logarithmic scale : 

Semi Logarithmic Line Graph. In such a graph, the time 
variable along X-axis is expressed on a natural scale and the logarit- 
hms of the values of the phenomenon under study for different 
periods of time are plotted on the vertical axis on a natural scale, 
The points so obtained are joined by straight lines to give the desired 
curve, Since, in this type of curve the logarithms are taken along 
only one axis, it is known as semi-logarithmic graph and it is specially- 
useful for studying the rates of change in the dependent variable 
(phenomenon under study) for different periods of time in a time 
series, 

Logarithmic Line Graph, In this graph, both the variables 
along horizontal and vertical axis are plotted on a logarithmic scale. 
For instance, for a time series data, the logarithms of the time values 
are plotted along horizontal axis and the logarithms of the values of 
the variable are plotted along the vertical axis, each on a natural 
scale. The required graph is obtained on joining the points so ob- 
tained by straight lines. However, it is very difficult to interpret such 
a graph and in practice, mostly semi-logarithmic graph is used. 

Remarks 1. In a semi-logarithmic graph, almost always, the 
vertical scale or Y-scale is a logarithmic scale. Since a semi-logari- 
thmic graph is useful for studying the relative changes or rates and 
ratios of increase or decrease over different periods of time, it is also 
called a Ratio Graph or Ratio Chart and the logarithmic scale is also 
called ratio scale, 

2. For practical purposes, semi-logarithmic graph papers (in 
which vertical scale is logarithmic scale 1.e,, log Y is marked along 
Y-axis and horizontal scale is natural i.e., the values of X are mar- 
ked in arithmetical scale), anologous to ordinary graph paper are 
available. The use of such a semi-logarithmic graph paper relieves 
us of the problem of looking up and plotting the logarithms of the 
values of a variable on a natural scale. The specimen of a semi- 
logarithmic graph paper is given below. 

SEMI - LOGARITHMIC GRAPH PAPER 


Fig. 4.56. 
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3. The following diagram displays the arithmetic and logari- 
thmic scales, 


*9542. 
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Arithmetic Logarithms on Logarithmic 
scale Arithmetic scale scale 
' Fig. 457. 


The reason why the logarithmic scale shows lower and lower 
distances as we move towards higher and higher magnitudes from 1 
to 10 is that the series which is increasing by equal absolute 
amounts (on an arithmetic scale ) is increasing at a diminishing 
rate, 

Arithmetic Scale Graphs vs. Ratio Scale Graphs 


1. A line graph on an arithmetic scale depicts the absolute 
changes from one period to another whereas опа ratio scale it 
reflects the rate of change between any two points of time, Thus 
the graph drawn on natural scale will not be able to reflect the rela- 
tive or percentage changes or the rate of change of the phenomenon 
for any two points of time, In most of the problems of gr owth, €g., 
data relating to population, production, sales or profits of a business 
Concern, national income, etc., absolute changes if shown on the 
graph on a natural scale, are often misleading. As an illustration, let 
Us consider the following hypothetical figures relating to the profits 
of a business concern, 


Year Profits Increase over profits of preceding year 
(Rs. '000) Absolute (Rs. '000) Percentage 
1970 15 em 
15 100 
1972 50 20 66 


эш 
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Thus in the above table, although the absolute increase shows 
@ steady increase in the profits, the percentage or relative increase 
registers a steady decline. It is surprising to note that the smallest 
percentage increase (for the year 1975) corresponds to the greatest. 
absolute increase, a fact which is prominently displayed on a semi- 
logarithmic graph by the flattening of the slope of the curve, Hence, 
if the primary objective is to study the rate of. change in the magnis 
tudes of à phenomenon, the data plotted on a natural scale will 
give quite wrong and misleading conclusions. In such a case the 
ratio or semi-logarithmic scale is the appropriate one. 

2, Onan arithmetic or natural scale equal absolute amounts 
(along vertical axis) are represented by equal distances whereas in a 
ratio scale equal distances represent equal proportionate movements - 
or equal retative rates of change or equal percentage changes, Thus 
in a natural scale, the readings are in arithmetical progression while 
in a ratio scale they are in geometric progression as exhibited in the 
following diagram, 


BT mn mene енен ын 32 320-7 3200 


4 16 160 1600 
3 8 80 800 
2 4 40 400 
1 2 20 200 
gd. E БЕБЕ Ч | 10 100 
Natural scale Ratio scale 
Fig. 4.58. 


Thus on a logarithmic scale, the distances between the points 
on the vertical scale represent the distances of the logarithms of the 
numbers and not the distances of the numbers themselves, 


3. On a natural or arithmetic scale, the vertical scale must 
start with zero. Since the logarithm of zerò is minus infinity, f.e., 
since log (0) — co, in a ratio graph there is no zero base line, Thus, 
іп a ratio graph, the vertical scale starts with a positive number, 
Further, since log (1)=0, the value 1 is placed atazero distance 
from the origin £.e., at the origin itself. Hence, in a ratio chart, the 
origin along the vertical scale is at 1, 

4. Incase the magnitudes of the phenomenon under consi- 
deration have a very wide range, Że., the values differ widely in 
magnitudes, then ratio graph is more appropriate than the graph 
on an arithmetic scale, In this regard, James A. Field writes 3 


“It is far superior to the natural scale for affecting compari- 
son when very small and very large quantities must be taken into 
account concurrently. Whenever a historical curve records extreme 
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growth the same advantage is found. Itis not necessary to dwarf 
the small beginnings in order to keep the later development within 
manageable dimensions." 


5. Ininterpreting the graphs drawn on a natural scale, the 
relative position of the curve on the graph is very significant. But, 
in interpreting a ratio graph it is the shape, direction and degree of 
steepness of the graph (i.e., straight line or a curve sloping upwards 
or downwards) that matters and not its position. Accordingly, on a 
semi-logarithmic scale, the different graphs can be moved up and 
down without changing their meaning (interpretation). Hence 
ratio graph can be effectively used to graph, for purposes of compa- 
risons, two or more phenomena (variables) which differ widely in 
their magnitudes or which are measured even in different units, For 
instance, forcharting the data relating to the population growth, 
agricultural or industrial output, prices, profits sales etc., the ratio 
graph or semi-logarithmic graph is more appropriate. Such com- 
parison, however, might be misleading on a natural scale. 


Uses of Semi-Logarithmic scale or Ratio Scale. From the 
above discussion, the uses of ratio or semi-logarithmic scale may be 
summarised as follows : 


l. For studying the rates of change (increase or decrease) or 
the relative or percentage changes in the values of a phenomenon 
like population, production, sales, profit, income, etc. 

2. For charting two or more phenomena differing very widely 
in their magnitudes. 

3. For charting and comparative study of two or more 
phenomena measured in different units. 

4. When we are interested in proportionate or percentage 
changes rather than absolute changes. 


Limitations of Semi-Logarithmic or Ratio Scale. 


. 1. Since log (0) = —co and the logarithm of a negative quantity 
is not defined, the ratio scale cannot be used to plot zero or nega- 
tive values. Accordingly, it cannot be used to represent the 'Net 
Balance’ or ‘Balance of Trade’ on the graph. 
2. Another limitation of the ratio scale is that it cannot be 
- used to study the total magnitude and its break up into component 
parts of any given phenomenon. 

3. It cannot be used to study absolute variations. 

4. Lastly, it is quite difficult for a layman to draw and inter- 
pret ratio charts. The interpretation of a semi-logarithmic graph 
requires great skill and expertise. This is a great handicap in the 
mass popularity of ratio or semi-logarithmic graphs. 

Shape of the Curve on Semi-Logarithmic and Natural 
Scales. — Е ` 

1. The values of'a phenomenon increasing by a constant 
amount will give а straight line rising upward on an arithmethic 
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scale while on a semi-logarithmic scale it will give an upward rising 
curve with ifs slope steadily declining (which implies a steady 
decreasing rate). In other words, it will be a curve concave to the 
base. This is so because the values increasing by a constant absolute 
amount increase at a declining rate. This is shown in the following 
diagram. { 


SERIES INCREASING BY A CONSTANT ABSOLUTE AMOUNT 


ARITHMETIC SCALE LOGARITHMIC SCALE 


O — юэ O >= о 4 @ o 


1 
1970 1972 1974 1976 19781979 1970 1972 1974 1976 1978 1979 
Fig. 4.59 


2. A time series increasing at a constant rate will give a 
curve convex to the base (i.e., a curve rising upwards towards the 
right with its slope gradually increasing), on a natural scale. How- 
ever, on a ratio scale, it will give an upward rising straight line 
graph as shown in the following diagram. 


SERIES INCREASING AT A CONSTANT RATE 
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Fig. 4.60 
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| 3. Ifthe time series values decrease by a constant absolute 
amount the graphs on the two scales will be like the mirror images 


SERIES DECREASING BY CONSTANT ABSOLUTE AMOUNT 


ARTHMETIC SCALE LOGARITHMIC SCALE 


© 


^ почео 
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Fig. 4.61 


of the graphs in case 1, inthe reserve order (as shown in the 
diagram 4.61). i.e., оп а natural scale it will give a straight line 
moving downwards (rapidly declining) and on a ratio scale it will 
give a curve falling to the right with its slope increasing. 


4. Similarly for a time series decreasing at a constant rate 
the graphs on the two scales will be the mirror images of the graphs 
in case 2, in the reverse order (as shown in the following diagram) 
i.e., on an arithmetic scale, we shall get a curve moving downwards 
with a declining slope and on a semi-logarithmic scale we shall get 
a straight line moving downwards. 


SERIES DECREASING BY CONSTANT RATE 


ARITHMETIC SCALE LOGARITHMIC SCALE 
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Fig. 4.62 
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Interspretation of Semi-Logarithmic or Ratio Curves, 


Roark the curve is rising upwards, the rate of growth or in- 
crease is positive and a curve falling downwards indicates a decreas- 
ing rate. 


2. If the curve is nearly a straightline which is ascending, it 
represents the series increasing, more or less, ata constant rate, 
Similarly, a nearly straight line curve which is descending i.e., 
moving downwards, represents a series which is decreasing, more or 
less, at a uniform rate, 


3. If the curve rises (falls) steeply at one point of time than at 
another, it depicts rapid rate of increase (decrease) at that point 
than at the other point. 


4. If two curves on the same semi-logarithmic graph are 
parallel to each other, they represent equal percentage rates of 
change for each phenomenon, 


5. Ifone curve is steeper than the other on the same ratio 
chart, it implies that the first is changing at a faster rate than the 
second, 


We now give below, some illustrations of the use of ratio or 
semi-logarithmic graphs. 


Example 4.39. А firm reported that its net worth in the years 
1970-71 to 1974.75 wae as follows : 


Year 1970-71 1971-72 1972.73 1973.74 197475 
Net worth 100 112 120 133 147 


Plot the above data in the form of a semi-logarithmic graph. 

Can you say anything about the approximate rate of growth of its net 
worth ? 

[Delhi Uni. B.A. Econ. (Hons.), 1978] 


Solution. To plot the above data ona semi-logarithmic scale 
we plot the logarithms of the dependent variable, (Net Worth), 
along the vertical axis on a natural scale, The horizontal axis, as 
usual, will represent time variable on a natural scale, 


Year Net Worth (Y) Log (Y) 
BES 
1970-71 100 2:00 
1971-72 112 2:05 
1972-73 120 2:08 
1973-74 133 2:12 
1974-75 147 2:17 


The graph is shown on page 206. 
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SEMI - LOG GRAPH (NET WORTH OF A FIRM) 
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Fig. 4.63. 


Comments. Since the graph is ascending throughout, dt 
reflects the increasing rate of growth of the net worth for the entire 
period. However, since the graph is steepest for the period 1970-71 
to 1971-72, it represents the highest rate of positive growth (increase) 
‘during this period. Then however, there is slight decline in the rate 
of increase for the period 1971.72 to 1972-73. There is again in- 
crease in the rate of growth for the Period 1972-73 to 1974-75 over 
the period 1971.72 to 1972.73. Further, since the graph for period 
1972-73 to 1974-75 is almost a straight line, it represents a constant 
rate of increase during this period. 


Example 4'40. The following table gives the population of 
India at intervals of 10 years : 


OS SES Ta EN MER 


Year Populatian 
a A DOE ES >з су... _ 
1931 27, 88, 67, 430 
1941 31, 85, 39, 060 
1951 36, 09, 50, 365 
1961 43, 90, 72, 582 
1971 54, 79, 49, 809 


Ao uM Ue s o л" ыша. 


Plot the data on a graph paper. From your graph determine the 
decade in which the rate of growih of population was, 
(3) the slowest, 
(ii) the fastest, 
[Bombay Uni. B. Com. (October) 1976] 
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Solution. Since we are interested in determining the rate of 
growth for defferent decades, the appropriate graph will be obtained 
on plotting the data on a semi-logarithmic or ratio scale, Logarithms 
of the population values are plotted along the vertical axis on a 
natural scale and time variable (decades) are plotted along the hori- 
zontal axis on a natural scale, 


a ee 


Year Population (Y) log Y 
(in lakhs) 

1931 2789 3:45 

1941 3185 3:50 

1951 Н 3610 3:56 

1961 4391 3:64 

1971 5479 3°74 


The graph is shown in figure 4°68 below. 


SEMI - LOG GRAPH 
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Fig. 4.64. 


Comments. Since the graph is ascending throughout, it 
reflects an increasing rate of population growth throughout. the 
entire period. Further, since the graph has a maximum steep 
during the period 1961—1971, the rate of growth is xaximum du. 
ring this decade, Again, since the graph has minimum {еер for the 
period 1941—1951, the rate of growth is minimur during this 

ecade. 
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4.5. Limitations of Diagrams and Graphs 


Diagrams and graphs are very powerful and effective visual 
Statistical aids for presenting the set of numerical data but they have 
their limitations some of which are outlined below : 


(i) Diagrams and graphs help in simplifying the textual and 
tabulated facts and thus may be regarded as supplementary to sta- 
tistical tables, They should not be regarded as substitutes for 
classification tabulation and some other forms of presentation of a 
set of numerical data under all circumstances and for all purposes, 
Julin has very elegantly stated this limitation in the following 
words : 


“Graphic statistic has a role to play of its own ; tt is nof the ser- 
vant of numerical statistics bul it cannot pretend, on the other hand, 
to precede or displace it” 


(ii) They give only general idea of data so as to make it readily 
intelligible and thus furnish only limited and approximate informa» 
tion. For detailed and precise information we have to refer back to 
the original statistical tables. Accordingly, diagrams and graphs 
should be used to explain and impress the significance of statistical 
facts to the general public who find it difficult to understand and fol- 
low the numerical figures, They are, therefore, appealing to a 1ауч 
man who does not have any statistical background but not to a 


-statistician because they are not amenable to further mathematical 


treatment and hence are not of much use to him from analysis point 
of view. 

(iii) They are subjective in character and therefore, may be 
interpreted differently by different people. If the same set of data 
are presented diagrammatically (graphically) on two different 
scales, the sizes of the diagrams (graphs) so obtained might differ 


widely and thus generally, mighi create wrong and misleading 


impressions on the minds of the people. Hence, they are likely to 
be mis-used by unscrupulous and dishonest people to serve their 
selfish motives during advertisements, publicity, etc. Hence, they 
should not be accepted on their face value without proper scrutiny 
and caution. 

(v) All the diagrams and graphs are not easy to construct. 
Two and three dimensional diagrams, and ratio graphs require 
more time and great amount of expertise and skill for their cons- 
truction and interpretation and are not readily perceptible to non- 
mathematical person. 

(v) In case of large figures (observations), such a presentation 
fails to reveal small differences in them. 

(vi) The choice of a particular diagram ог graph to present a 
given set of data requires great expertise, skill and intelligence on 
the part of the statistician or the concerned agency engaged in the 
work. A wrong type of diagram/graph may lead to very fallacious 
and misleading conclusiens, In this context C.W, Lowe writes : 
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\ “The important point that must be borne in msnd a£ all times ie 


| that the pietorial presentation, chosen for any situation, must depict 
\the true relationship and point out the proper conciusion. Use of an 
Inappropriate chari may distort the facts and mislead the reader. 
Above all, the chari must be honest,” 


(vi) гр чаш presentation should be used only for com- 
parison of different sets of data which relate either to the same 
phenomenon or different phenomena which are capable of measure- 
ment in the same unit. They are not useful, if absolute information 

is to be represented. ў 


EXERCISE 4, (П) 
1. (а) Discuss the utility and limitations of graphic method of present- 


ing statistical data. [Delhi Uni., B. Com. (Hons.), 1976] С 
(b) Discuss the advantages and limitations of representing statistical data d 
by diagrams (including graphs). (Bombay Uni., B. Com., 1975). в, 


a Describe briefly the guiding principles for the graphic presentation of 
the data. 
(d) Explain the advantages of graphic representation of statistical data. 
^ (Mysore U, B. Com. April 1982) 


2. (a) What are various types of graphs used for Presenting a frequency 
distribution. Discuss briefly their (0 construction and (if) relative merits and 
demerits. 

(b) Explain briefly the various methods that are used for graphical repre- 
sentation of frequency distribution. (Delhi Uni., M.B.A., 1977) 


3. Give an illustration each of the type of data for which you would 
‘expect the frequency curve to be: 


() fairly symmetrical, (i?) positively skewed, (iii) negatively skewed, 
(iv) J-shaped, (v) U-shaped. 


4. Comment on the following : 


(а) "The wandering of a line is more powerful in its effect on the mind 
than a tabulated statement iit shows what is happening and what is likely to 
take place just as quickly as the eye is capable of working.” —Boddington 

(b) “Graphs are dynamic, dramatic, They may epitomise an epoch, each 
dot a fact, each slope an event, each curve a history ; wherever there are data 
to record, inferences to draw, or facts to tell, graphs furnish the unrivalled 
means whose power we are just beginning to realise and apply.” —Hubbard 


5. Explain clearly the distinctions between “natural scale" and “зетї- 
logarithmic seale” used in the graphical presentation of data. 
(Guru Nanak Dev, U. B. Com., 1981) 


6. (a) What do you mean by a false base line ? Explain its utility in 
&raphic representation of statistical data, 

(b) "А false base line of a graph їз а wrong base line." Comment. 

(c) What is false base fine? Under what circumstances should it be 

? (Nagarjuna U. B. Com. Oct. 1980) 

7. Describe briefly the construction of histogram and frequency polygon 
of a frequency distribution and state their uses. 

Prepare a Histogram and a Frequency Polygon from the following data : 

Class : 0—6 6-12  12—18  18—24 24-30  30—36 

Ea 2 4 8 15 20 12 6 


(Delhi, U. B. Com. 1977) 
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8. Marks obtained by 50 students in a History paper of full marks £00 
are as follows :— 
78 25 25 40 `3 29 35 42 43 43 
44 20 48 44 43 48 36 46 48 47 
36 60 31 479 93 - 65 68 73 39 12 
60 20 47 49 51 38 49 35 52 61 
34 76 195220 16 70 65 39 60 45 
Arrange the data їп a frequency distribution table in class intervals of 
length 5 units and draw a histogram to present the above data. 
U.C.W.A. (Intermediate) June 1980] 
9. Draw a histogram of the following distribution : 
Life of Electric Lamp (Mid. Values) 1010 1030 1050 1070 1090 
(in hours) 
Firm A 10 130 482 360 18 
Firm B 287 105 26 230 352 


10. (а) Represent the following data by Histogram : 

Weight (kg.) 35—40 40—45 45—50 50—55 55—60 60—65 

No. of Persons 12 30 22 30 18 10 
[Delhi U. B. Com. (External) 1982] 


(b) From the following data draw (i) Histogram (ii) Frequency poly- 
gon and (iii) Frequency curve :— 


Wages in Rs. No. of persons 

0—10 2 
10—20 4 
20—30 1 
30—40 15 
40—50 25 
50—60 18 
60—70 15 
70—80 4 
80—90 2 


[Punjab U. B.A, (Есеп. Hons.) 1980] 


ll Draw histogram and frequency polygon to present the following 


data 
Income (Rs.) No. of Income (Rs.) No. of 
Individuals Individuals 
100—149 21 300 —349 62 
150—199 32 350—399 43 
200 —249 52 400—449 18 
250 —299 105 450 —499 9 


[I.C.W.A. (Intermediate) June, 1981] 


12. From the following data, construct (i) Frequency histogram, (ii) Fre- 
quency Polygon and (iii) Frequency Curve : 


Wages groups (Rs.) 0—10 10-20 20-30 30—40 40- 
11 15 25 


No. of Workers : 2 4 
Wages groups (Rs.) : 50—60 60—70 70—80 80-90 
No. of Коке : 3 18 15 4 3 


(Kurukshetra U. B. Com, 1981) 
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13, (a) What аге cumulative frequencies ? How do you present them dia- 
gtammatically for discrete and continuous distributions ? 
(b) Whatisa cumulative frequency curve? Mention itskinds. Take. 
an example to illustrate them. (Lucknow U. B. Com. 1982) 
(c) Explain the difference between a histogram and frequency polygon. 
What is an ogive curve ? State the purpose for which it is used. 
(Bombay U. B. Com. 1974) 
14 (a) For the following distribution of wages, draw ogive and hence find 
the value of median. 
Monthly Wages Frequency Monthly Wages Frequency 


125—175 2 37`5— 42:5 4 
175—225 22 42:5-4T5 6 
225—275 10 4T5—52:5 1 
275—325 14 525-5T5 1 
325—375 3 — 
Total 63 


U.C.W.4. (Intermediate) Dec, 1977] 
Ans, Md=26 (approx). 
15 Below із given the frequency distribution of marksin Mathematics 


obtained by 100 students in a class : Ра 
Marks : No. of students Marks No. of students 
20—29 if 60—69 9 
30—39 11 70-79 14 
40—49 24 80—89 2 
50- 59 32 90-99 1 


Draw the ogive (less than or more than type) for this distribution and use 
И to determine the median. Г.С. А. (Intermediate) June 1981] 


16, The following table gives the distribution of the wages of 65 employees 
in a factory : 
Wages in Rs. 
(Equal (о or more than) 50 60 70 80 90 100 110 120 
Number of Employees 65 57 47 34 17 7 2-50. 
Draw a "less than’ curve from the above data, and estimate the number of 
employees earning at least Rs. 63 but less than Rs. 75. 
[Delhi U. B. Com. (Hons.) 1978] 
Ans. 15 
17. Drawa less than cumulative frequency curve of the following distri- 
рип and find the limits for the central 60% of the distribution from the 
graph, 
х (less than) : 5 10 15 20 25 30 35 40 45 
Ргейпёпсу 2072) 1151729: ТАБ ("54075563901 95 КИП 
(Bombay U. B. Com. May, 1982) 
Ans. 13 to 29-8. 
18. Following is the distribution of marks in law obtained by 50 
students : 
Marks (more than) : 0 10 20 30 40 5 
No. of students : 50 46 40 20 10 3 


Draw an ogive curve on a graph paper. Also calculate the median 
marks. (Delhi U. В, Com, 1980) 
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19 Construct a frequency table for the following data regarding annual 
* DUM thousands of rupees in 50 firms, taking 25—34, 35—44, etc., as class 
ntervals. 


28 35 61 29 36 48 57 67 69 50 
48 40 47 42 4l 37 51 62 63 33 
31 32 35 40 38 37 60 51 54 56 
37 46 42 38 61 59 58 44 39 57 
38 44 45 45 47 38 4 47 47 64 


Construct a less than ogive and find : 
(i) Number of firms having profit between Rs. 37,000 and Rs. 58,000. 
(ii) Profit above which 10% of the firms will have their profits. 
(ii) Middle 50% profit group. 
(Bombay Uni. B. Com. May 1978) 
Ans, (i) 30, (ii) Rs. 62,000 (iii) Rs. 39,000 to Rs. 56,000. 


A: What do you mean bya historigram ? How does it differ from histo- 
gram 


. 21. (a) What are different types of graphs commonly used to present a 
time series data ? Bring out their salient features. 
(b) Describe briefly. 
(i) Historigram, (ii) Silhouette ог Net Balance Graph 
(iii) Range Graph, (iv) Band Graph, 
for presenting time series data. 
22. (а) Represent the data relating to consolidated budgetary position 
of states in India as given below, on a graph paper. 


(Rs. crore) 
Year Revenue Expenditure Surplus of Deficit 
1955-56 5601 6264 —66:3 
1956-57 577-0 6544 —77'3 
1957-58 705-6 6773 +2413 
1958-69 7421 7458 = 37 
1959-60 833:9 829:9 +40 


Also depict graphically, the net balance of trade. 
(Allahabad U. B. Com. 1979) 


23. Represent the following data by means of a time series graph. 
Pins 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 


port 
(Rs.'000)267 269 263 275 270 280 282 272 265 266 
Import 265 
(Rs.'000)307 30 280 260 275 271 280 280 260 


Show also the net balance of trade. 


24. Prepare a graph of the following data by using a false base : 
Consumer Price Index Numbers (1960— 100) 
у 
я Ave 73 1974 
1969 1970 1971 1972 19 
All India 177 186 192 207 250 360 
Delhi 185 199 211 222 265 337 


(Delhi U.B. Сот. 1981) 
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25, Present the following information graphically : 
EXPORTS OF IRON ORE 


Year : 1972 1973 1974 1975 1976 
Quantity 

(Million tons) : 10-5 11-0 12°5 140 15°0 
Value (Rs. crores) : 360 40:5 50:0 65:5 800 


26+ Present the following data about India by a suitable graph : 
PRODUCTION IN MILLION TONS. 


Year Rice Wheat Pulses Other Total 
Cereals 
1962 30 10 10 14 64 
1963 32 1 8 18 69 
1964 33 8:5 11:5 20 73 
1965 35 12 11 20 78 
1966 36 10 10 22 78 
1967 38 1 9 23 81 


Hint. Band Graph 
27. Present the following data by a suitable graph : 


MINIMUM AND MAXIMUM PRICE OF GOLD 
For 10 GMS, FOR THE YEAR 1967 


Months Highest Lowest ^ Months Highest Lowest 
Price Price Price Price 
(Rs.) (Rs.) (Rs.) (Rs.) 
January 160-0 1520 July 1750 163:2 
February 1622 1560 August 175:8 160-0 
March 165:0 1603 September 172-2 1650 
April 166'5 162:4 October 178'0 168:0 
May 168:2 160°5 — November 171:0 1050 
June 1700 161°9 December 175:5 1670 


Hint. Range Graph. 
А 28, (а) Diffierentiate between the natural scale and logarithmic scale used 
in graphic presentation of data. In which cases should the latter scale be used ? 
(b) Explain what is meant by semi-logarithmic diagram and discuss its 
advantages over the natural scale diagram. 
(с) Explain briefly how you will interpret the graphs drawn on a semi- 
logarithmic scale. я 
(d) What do you understand by а ratio-scale ? Under what situations 
ratio charts should be drawn ? 


29. The following table shows the total sales of Gold Bonds by the Reserve 
Bank of India : 


Months Rs. (*000) Months Rs. (*000) 
Oct. 1965 15,560 April 1966 3,250 
Nov. 1965 13,170 May 1966 3,570 
Dec. 1965 18,740 June 1966 3,620 
Jan. 1966 12,450 July 1966 3,140 
Feb. 1966 8,320 Aug. 1966 2,580 


March 1966 7,540 Sept. 1966 2,540 
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Represent the data graphically on the logarithmic scale on a piece of graph 
Paper or on plain paper, 


30. Plot the following data graphically on the logarithmic Scale. 


Year Total notes Total notes in 
issued circulation 
(n crores Rs.) (in crores Rs.) 
1965—66 2890 2866 
1956—67 3065 3020 
1967-68 3242 3194 
1968-69 3536 3497 
1969 —70 3866 3843 


31. Present the following data graphically and comment on the features 
_ thus revealed : 


Production of Steel Plates 
(in thousand tons) 


How will the graph look like if the data are plotted on semi-logarithmic 
scale? 


5 
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51. Introduction. One of the important objectives of 
Statistical analysis is to determine various numerical measures 
which describe the inherent characteristics of a frequency distri- 
bution, The first of such measures is average, The averages are 
the measures which condense a huge unwieldy set of numerical 
data into single numerical values which are representative of the 
entire distribution, In the words of Prof. R.A, Fisher. “The in- 
herent inability of the human mind to grasp in its entirety a large body 
of numerical data compels us to seek relatively few constants that will 
adequately describe the data”. Averages are one of such few cons- 
tants, Averages provide us the gist and give a bird’s eye view of the 
huge mass of unwieldy numerical data. 

Averages are the typical values around which other items of 
the distribution congregate. They are the values which lie between 
the two extreme observations, (t.e., the smallest and the largest 
observations), of the distribution and give us an idea about the 
concentration of the values in the central part of the distribution, 
Accordingly they are also sometimes referred to as the Measures of 
Central Tendency. Averages are very much useful : 


(i) For describing the distribution in concise manner, 


(ii) For comparative study of different distributions. 

(iii) For computing various other statistical measures such as 
dispersion, skewness, kurtosis and various other basic characteristics 
of a mass of data. 

Remark. Averages are also sometimes referred to as Measures 
of Location since they enable us to locate the position or place of the 
distribution in question. 


We give below some definitions of an average as given by 
different statisticians from time to time, 
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WHAT THEY SAY ABOUT AVERAGES—SOME 
DEFINITIONS 


‘Averages are statistical constants which enable us to compre- 
hend in a single effort the significance of the whole”. 


A.L. Bowley 


“Ап average ia а single value selected from a group of values 
to represent them in some way a value which ќа supposed to stand 
for whole group of which it ie part, as typical of all the values $n the 
group”. А.Е. Waugh 

“An average value is a single value within the range of the data 
that is used to represent all of the values in the series. Since am 
average 48 somewhere within the range of the data, $t is sometimes 
called а measure of central value,” Croxton and Cowden 

“An average is sometimes called a measure of central tendency 
because individual values of the variable usually cluster around ii. 
Averages are useful, however, for certain types of data in which there 
$a little or no central tendency,” Crum and Smith 


“Statistical andlysis seeks to develop concise summary figures 
which describe a large body of quantitative data. One of the most 
widely used set of summary figures is known as measures of location, 
which are often referred to ав averages, measures of central tendency 
or central location. The purpose for computing an average value for 
@ set of observations $a to obtain a single value which ix representative 
of all the items and which the mind can grasp simply and quickly. 
The single value is the point or location around which the individual 
items cluster," Lawrence J. Kaplan 


52. Requisites of a Good Average or Measure of Central 
Tendency. According to Prof. Yule, the following are the desi- 
deratta (requirements) to be satisfied by an ideal average or measure 
of central tendency : 


() It should be rigidly defined i.e., the definition should be clear 
and un-ambiguous so that it leads to one and only one interpreta- 
tion by different Persons. In other words, the definition should not 

leave anything to the discretion of the investigator or the observer, 
Tf it is not rigidly defined then the bias introduced by the investiga- 
tor will make its value unstable and render it unrepresentative of 
the distribution. 

(#3) It should be easy to understand and calculate even fora 
non-mathematical person. In other words, it should be readily 
comprehensible and should be computed with sufficient ease and 
rapidity and should not involve heavy arithmetical calculations, 
However, this should not be accomplished at the expense of accu- 
racy or some other advantages which an average may possess. 


(i) It should be based on all the observations, Thus in the 
computation of an ideal average the entire set of data at our dis. 
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posal should be used and there should not be any loss of informas 
tion resulting from not using the available data. Obviously, if the 
whole data is not used in computing the average, it will be unrepre- 
sentative of the distribution, 


(iv) Tt should be suitable for further mathematical treatment. 
In other words, the average should possess some important and 
interesting mathematical properties so that its use in further statis. 
tical theory is enhanced. For example, if we are given the averages 
and sizes (frequencies) of a number of different groups then for an 
ideal average we should be in a position to compute the average of 
the combined group. If an average is not amenable to further elge- 
braic manipulation, then obviously its use will be very much limited 
for further applications in statistical theory. 


(v) It should be affected as little as possible by fluctuations of 
sampling. By this we mean that if we take independent random 
samples of the same size from a given population and compute the 
average for each of these samples then, for an ideal average, the 
values so obtained from different samples should not vary much from 
one another. The difference in the values of the average for different 
samples is attributed to the so called fluctuations of sampling. This 
property is also explained by saying that an ideal average should 
possess sampling stability. 3 

(vi) It should not be affecled much by extreme observations, By 
extreme observations we mean very small or very large observations. 
"Thus a few very small or very large observations should not unduly 
affect the value of a good average. 


5'3. Various Measures of Central Tendency. The follow- 
ing are the five measures of central tendency or measures of location 
which are commonly used in practice. 


(i) Arithmetic Mean or simply Mean, 
(ii) Median. 
(ii) Mode. 
(iv) Geometric Mean. 
(v) Harmonic Mean. 
In the following sections we shall discuss them in detail one by 
one. 


5.4. Arithmetic Mean. Arithmetic mean of a given set of 
observations is their sum divided by the number of observations. 
For example, the arithmetic mean of 5, 8, 10, 15, 24 and 28 is 


54-8--104-15--24--28 90 


6 —$-15 
In general, if X,, Х,,...,Хһ are the given n observations, then 
their arithmetic mean, usually denoted by X is given by : 
т-н, EE (51) 
n n 
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where ХХ is the sum of the observations, 


In case of frequency distribution : 
х|х, X; A Xs 


ERES. Ao f 
the arithmetic mean Х is given by : 
g= (Ait Xi +. fitimes)-+(X,+X.+..fotimes)-+.+(Xn+Xn+.. fstimes.) 
fitfet...tfn 
X] Xs 0 
Ji fo «fn 
LX XX 
УР FON 
where N = is the total frequency. 


In case of continuous or grouped frequency distribution, the 
value of X is taken as the mid-value of the corresponding class. 


(5.2) 


Remark. ThesymbolZ is the letter capital sigma of the 
Greek alphabet and is used in mathematics to denote the sum of 
values, 


Steps for the Computation of Arithmetic Mean 


(i) Multiply each value of X or the mid-value of the class (in 
case of grouped or continuous frequency distribution) by the corres- 
pondiùg frequency f. 


(ŝi) Obtain the total of the products obtained in step (i) above 
to get Zf X, 


(iii) Divide the total obtained in step (ii) by N —Zf, the total 
frequency. 


The resulting value gives the arithmetic mean. 


., Example 5.1. А random sample of 10 boys had the following 
intelligence quotients (IQ's). 


70, 120, 110, 101, 88, 83, 95, 98, 107, 100 
Find the mean 1.0. 
Solution, Mean LQ. (X) is given Ьу: 


ї= == = (70+120+ 110+101+88+83+95-+-98+107+ 100) 


Example 5.2. The following is the frequency distribution of the 
number of telephone calls received in 245 successive one-minute inter- 
vals at an exchange : 


Number of Calls : DEM ИС 24 o 55 US D 
Frequency : Th 525 25 48 501 40 $39 12 
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Obtain the mean number of calls per minute, 


Solution. Let the variable X denote the number of calls re- 
ceived per minute at the exchange. 


COMPUTATION OF MEAN NUMBER OF CALLS 


No. of Calls (X) Frequency (f) 


MO Un шю © 


zfX-922 


E Total 


Mean number of calls per minute at the exchange is given 
by : 


кз 922 e 
X ЖАКЕ ОДО =3'763 
Ezample 5'3. Calculate the mean for the following frequency 
distribution. 
Marks: 0—10 10—20 20—30 30—40 40—50 50—60 60—70 
Number of 
students: 6 5 8 15 7 6 8 


Solution. 
COMPUTATION OF ARITHMETIC MEAN 
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5.4.1. Step Deviation Method for Computing Arithmetic 
Mean. It may be pointed out that the formula (5.2) can be used 
conveniently if the values of X or/and f are small. However, if the 
values of X or/and f are large, the calculation of mean by the for- 
mula (5.2) is quite tedious and time consuming. Insucha case the 
calculations can be reduced to a great extent by using the step devia- 
tion method which consists in taking the deviations (differences) of 
the given observations from any arbitrary value А, Let 


d=X—A ...(5.8) 
. then -a4 ...(5.4) 


This formula is much more convenient to use for numerical 
problems than the formula (5.2). 

In case of grouped or continuous frequency distribution, with 
class intervals of equal magnitude, the calculations are further 
simplified by taking : 

X—A 
nes » (5.5) 


where X is the mid-value of the class and h is the common magni- 
tude of the class intervals, Then 


© Zfa 
Rast (5.6) 


Steps. (i) Compute d= (X—A)/h, A being any arbitrary num- 
ber and й is the common magnitude of the classes. Algebraic signs 
+ or — are to be taken with the deviations, 

(i) Multiply d by the corressponding frequency f to get fd. 

(iii) Find the sum of the products obtained in step (i$) to get 

(iv) Divide the sum obtained in step (i$) by №, the total 
frequency. 

@) Add A to the value obtained in step (iv). 

The resulting value gives the arithmetic mean ofthe given 

distribution. 

Remarks 1. If we take AÀ—1, then formula (5.6) reduces 


to formula (5.4), 

2. Any number can serve the purpose of the arbitrary con- 
stant ‘4’ used in (5.4) and (5.6) but generally the value of X corres- 
ponding to the middle part of the distribution will be more con- 
venient. In fact, ‘4’ need not necessarily be one of the values of X. 
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Example 5.4. Compuie the arithmetic mean of the data in 
Example 5.3 by using the step deviation method. 


Solution. 


COMPUTATION OF A.M. BY STEP DEVIATION METHOD 


p aa Afda _ 10 x (—8) 
A+ N RASSE OUS 
=35—1 6-33 4 marks. 

5.4.2. Mathematical Properties of Arithmetic Mean. # 
Arithmetic mean possesses some very interesting and important 
mathematical properties as given below : 

Property 1. The algebraic sum of the deviations of the given 
set of observations from their arithmetic mean is zero. 


Mathematically, 
Z(X—X)—0, (5.7) 
or for a frequency distribution 
Zf(X—X)-0 ...(5.7а) 


Remarks 1. In computing algebyaic sum of deviations, we 

take into consideration the plus and minus sign of the deviations 

Я (X—X) as against the absolute deviations (c.f. Mean Deviation in 
Chapter 6) where we ignore the signs of the deviations, 


2. Verification of property 1 for the data of Example 5.3. 
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ALGEBRAIC SUM OF DEVIATIONS FROM MEAN 


EAX—X)=0 


Thus. Sf(X—X)=0, as required. 


It should be kept in mind that in case of the values of the 
variable when no frequencies are given, we will get Z(X — X)—0, 


As a simple illustration, let us consider the following case. 


[| 


Hence Z(X — X)—0. 


Property 2. Mean of the Combined Series. If we know 
the sizes and means of two component series, then we can find the 
mean of the resultant series obtained on combining the given series. 


If n, and n, are the sizes. and X,, X, are the respective means of 


two series then the mean X of the combined series of size nyn, is 
given by : 


P m, Tn, X, 
х= Wa be ...(5.8) 


Remark, The above result can be generalised to the case of 
more than two series. If we have Ё series with respective sizes ny, 
i;,..., "x and means X,, X,,..., Y, respectively, then the mean X of 
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the combined series of size n 4-n;-I-...-I-nz is given by : 
rE 0,3, - n4 X, 4- ...--niXy 
D ng... 


Property 3. The sum of the squares of deviations of the given 
set of observations is minimum when taken from the arithmatic mean. 


Mathematically, for a given frequency distribution, the sum 
S=2f(X—A)?, ...(5.9) 


which represents the sum of the squares of deviations of given 
observations from any arbitary value ‘4’ is minimum when 


A=X 


Illustration of Property 3. Let us consider the values of the 
variable X as 1, 2, 3, 4, 5, 6, 7. 


SUM OF SQUARED DEVIATIONS FROM MEAN 


...(5:8а) 


2(X—¥)=0 xX-—Xy-28 


The sum of the squared deviations of given observations from 
their mean, in the above case is 3(X—X)*=28. If for the above 
case we take the deviations of the values X from any arbitary point 
A, (AX) and then compute the sum of squared deviations about A 
viz., Z(X— AP, 4X ; then this sum will be greater than 28 for all 
values of 4. Let us in particular take 4—5, (not equal to mean 
X=4), 


- 


Noe 
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SUM OF SQUARED DEVIATIONS ABOUT ARBITRARY 
POINT A—5 


x X-4-X—5 (Х—4)* 


X(X—A):—35 


Thus X(X—A)'—z(X--5)2—35 


which is greater than the sum of Squared deviations about mean 
viz., 28, 


Property 4. We have 3 
pim > YX=NX ++-(5-10) 
Result (5.10) is quite useful in the following problems : 


(a) If we are given the mean wages (X) of a number of workers 


- (N) in a factory then using (5.10) we can determine the total wage 


bill of the factory. 


(6) Wrong Observations. Suppose we compute the mean X 
of N observations and later on it is found that one, two or more of 
the observations were wrongly copied down. It is now required to 
Compute the corrected mean by replacing the wrong observations by 
the correct ones, By using (5:10), we can obtain the uncorrected sum 
of the observations which is given by NX. From this if we substract 
the wrong observations, say, X’, and X’, and add the corresponding 
Correct observations, say, X, and X, we can obtain the corrected 
Sum of the observations which wil! be given by 


NX— (X^; -X',)H-X,- Xs 
Dividing this by N, we get the corrrected mean. 


In general, if r observations are misread as Ж, qr ES dq 
while correct observations are X,, X,,..., Xr, then the corrected sum 
of observations is given by 


NX—(X'-EX',4 X) (ХХ...) 
Dividing this sum by N we get the corrected mean. 
For numerical illustration see Example 5.11. 
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Merits. In the light of the properties laid down by Prof. 
Yule for an ideal measure of central tendency, arithmetic mean 
possesses the following merits : 


(5) It is rigidly defined. 
(it) It is easy to calculate and understand, 
(iit) It is based on all the observations. 


(iv) It is suitable for further mathematical treatment. The 
mean of the combined series is given by (5.8) or (5.8 a). Moreover, 
it possesses many important mathematical properties (Properties 1 to 
4 as discussed earlier) because of which it has very wide applications 
in statistical theory. 

(v) Of all the averages, arithmetic mean is affected least by 
flucuations of sampling. This property is explained by saying 
that arithmetic mean is a stable average. 


Demerits. (i) The strongest drawback of arithmetic mean is 
that itis very much affected by extreme observations, Two or three 
very large values of the variable may unduly affect the value of the 
arithmetic mean. Let us consider an industrial complex which 
houses the workers and some big officials like general manager, 
chief engineer, architect etc. The average salary of the workers 
(skilled and unskilled) is, say, Rupees 550 per month. If the salaries 
of the few big bosses (who draw very high salaries) are also in- 
cluded, the average wage per worker comes out to be Rs, 750, 
say. Thus, if we say that the average salary of the workers in the 
factory is Rs. 750 p.m. it gives a very good impression and one is 
tempted to think that the workers are well paid and their standard 
of living is good. But the real picture is entirely different. Thus, in 
the case of extreme observations, the arithmetic mean gives a distor- 
ted picture and is no longer representative ofthe distribution and. 
quite often leads to very misleading conclusions. Thus, while dealing 
with extreme observations, artithmetic mean should be used with 
caution. 

(ii) Arithmetic mean can not be used in the case of open end 
classes such as less than 10, more than 70 etc., since for such classes 
we can not determine the mid-value X of the class intervals unless 
(i) we estimate the end intervals or (i$) we are given the total value 
or the variable in the open end classes, In such cases mode or median 
(discussed latter) may be used. 


(iii) It cannot be determined by inspection nor can it be loca- 
ted graphically. 

(iv) _ Arithmetic mean cannot be used if we are dealing with 
qualitative characteristics which can not be measured quantitatively 
such as intelligence, honesty, beauty etc, In such cases median (di 
cussed later) is the only average to be used. 


226 Business Statistics 


. (v) Arithmetic mean can not be obtained if a single observa- 
tion is missing or lost or is illegible unless we drop it out and com- 
pute the arithmetic mean of the remaining values, 


.. (vi) In extremely asymmetrical (skewed) distribution, usually 
arithmetic mean is not representative of the distribution and hence 
15 not a suitable measure of location. 


(vii) Arithmetic mean may leadto wrong conc'usions if the 
details of the data from which itis obtained are not available. In 
this connection it is worthwhile to quote the words of H. Secrist : 


“Ifan average is taken as а substitute for the details, then the 
arithmetic mean, in spite of the simplicity and ease of calculation, has 
little to recommand when series are non homogeneous.” 


The following example will illustrate this view point. 


Let us consider the following marks obtained by two students 
АА and B in three tests, viz., terminal test, half-yearly examination 
and annual examination respectively. 


Marks in :— I Test II Test III Test Average marks 
A 55% 60% 65% 60% 
B 65% 60% 55% 60% 

The average marks obtained by each of the two students at the 
end of year are 60%. If we are given the average marks alone we 
conclude that the level of intelligence of both the students at the end 
of the year is same. This is a fallacious conclusion since we find from 
the data that student A has improved consistently while student B 
has deteriorated consistently, 


(viii) Arithmetic mean may not be one of the values which 
the variable actually takes and is termed asa Jictitious average. 
Sometimes, it may give meaningless results, In this contextit is 
interesting to quote the remarks of the *Punch' journal : 


"The figure of 2.2 children per adult female waa felt to be in 
some respects absurd, and a Royal Commission suggested that middle 
classes be paid money to increase the average to a sounder and more 
convenient nunber"", 


Example 5,5, Marks scored by 50 students in a teat paper are 
given below ; 


90 45 48 55 39 25 31 12 18 21 

54 59 51 33 43 44 10 38 19 26 

41 35 37 41 46 33 51 37 58 58 

17 19 23 26 29 38 57 36 35 44 

43 27 19 48 22 31 47 34 31 15 
Prepare a frequency table with class intervals 10—19, 20—29, 
30--39 а AH calculate the value of the arithmetic mean from the 

frequency table obtained, 

(Bombay U. B. Com. May 1980) 
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Solution. 
FREQUENCY DISTRIBUTION OF MARKS 


Class Tally Frequency Mid-value - X—34.5 
G 5 È ) a 
8 b 
0 


10x3 
50 


Mean=4+h 20 ga 


=34.5-+0 6—35.1 

Hence the average marks scored by 50 students in the test are 
35.1235. 

Example 5.6 In the following grouped data, т are the mid- 
values of the class intervals and cis a constant. L If the arithmetic 
mean of the original distribution is 35.84 find its class intervals. 

z—c  : —21 ext ae BOR To a EEROR 

Í eave 12 19 29 20 13 5 100 

(Bombay U. B. Com. May 1982) 


Solution, Here z—cisthe deviation d from arbitrary point 
c ie. d=*—c. Hence the mean of the distribution is given by : 


dL 2f _ Zf(z—oc) Ps 
#=с+ NEU AE ed ) 
where N—2Zf. 
COMPUTATION OF MEAN AND CLASS INTERVALS 


10'5—17:5 
175—245 
24:5—315 
315—385 
38:5—45:5 
455—525 
52:5—59:5 


Xf(x—c)-84 
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Using (*) we get 


Faot 0554 (Given) 
gos 
> =35. 
c+ i =35.84 
= ¢=35.84—0-84=35 
Se 2—с=0 > ж==с=35 


Thus the mid-value of the class corresponding to the value z—c=0 
is =35. Further, since the magnitude of the class interval is 7, the 
corresponding class interval is obtained on adding and subtrac- 
ting 7/2—3.5 from 35 and is given by (35—3.5, 354-3.5) i.e, 
31.5—38.5. The class intervals are given in the last column of the 
above table. 


Example 5.7. From the following data of income distribution 
calculate the arithmetic mean. It is given that (i) the total income of 
persons in the highest group is Rs. 435, and (ii) none is earning less 
than Rs. 20. 


Income (Rs.) No. of persons Income (Rs.) Мо, of persons 


Below 30 16 Below 70 87 
40 36 ЖАСУ 95 
ДА 50 61 80 and over б 
60 76 


(Kurukshetra О. В. Сот. 1975) 


Solution. The open class "Income below 30" includes the 
persons with income less than Rs. 30. Butsince we are given that 
none is earning less than Rs. 20, this class will be 20—30. Moreover, 
we are given the cumulative frequency distribution which has to be 
converted into the ordinary frequency distribution as given in the 
following table : 


COMPUTATION OF ARITHMETIC MEAN 


Averages — 299 


* It is given that total income in the highest group is Rs. 435. 


z 9 , —_2fX 4800 _ 

SS Arithmetic Mean= Zf = 100 =Rs. 48. 

Example 5.8. An investor buys Rs. 1,200 worth of shares in 
а company each month. During the first 5 months he bought the shares 
at a price of Hs. 10, Re. 12, Ra. 15, Re. 20 and Re. 24 per share, 
After 5 months what ia the average price paid for the shares by him. 


[Delhi U. B. Com. (Hons.) 1971 ; B. Com., Pass 71, 73] 
Solution, Let X denote the price (in Rupees) of a share. 


Then the distribution of shares purchased during the first five: 
months is as follows : 


Month Price per share Total Cost No. of shares bought 
(x) (X Rs. (Л) 


>/=410 


Hence {һе average price paid per share for the first five 
months is 
_ EfX 6000 T 
х= Sp = aio ®® 14.68 


Remark. For an alternative solution of this problem see 
Harmonic mean § 5.10. 


Example 5:9. For a certain frequency table which has only 
been partly наа here, the mean wae found to be 1:46. 
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Number of accidents Frequency (Number of days) 


a ^ч © 
to 
e 


Total 200 


Calculate the missing frequencies. 
(Shivaji U. B. Com. Nov. 1978) 
Solution. Let X denote the number of accidents and let the 


missing frequencies corresponding to X—1 and X=2 be f, and f, 
respectively. 


COMPUTATION OF ARITHMETIC MEAN 


No. of accidents (X) Frequency ( f ) 


200—86-Lff, = frtfr=200-86=114 ...(*) 


xu. AEE 1.46 (Given) 


> Si +2f.+ 140=1.46 x 200—292 
> fit 2fy=292—140=152 ttt 
Subtracting (*) from (**) we get : 
f2=152—114=38 
Substituting in (*) we get 


fi=114—fy=114—-38=76 
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Example 5.10. The following are the monthly salaries in 
rupees of 20 employees of a firm : 


130 62 145 118 125 76 151 142 110 98 
65 116 100 103 71 85 80 122 132 95 


гле firm gives bonuses оў Re. 10, 15, 20, 25 and 30 for indi- 
viduals in the respective salary groups exceeding Ra. 60 but not excee- 
ding Rs. 80, exceeding Re. 80 but not exceeding Re. 100, and so on 
up to exceeding Re, 140 but not exceeding Re. 160. Find the average 

bonus paid per employee, 
[Guru Nanak Dev. U. B. Com. 1978] 


Solution. First we shall express the given data in the form 
of a grouped frequency distribution with salaries (in Rupees) in the 
class intervals 61—80, 81—100, 101—120, 121—140 and 141—160. 

The first value in the above distribution is 130,so we puta 
tally mark against the class interval 120—140 ; next value is 62, so 
we put a tally mark against che class 60—80 and soon, Thus the 
grouped frequency distribution is as follows : 


COMPUTATION OF AVERAGE BONUS PER EMPLOYEE 


Sal. in Rs, Tally Fi B in Rs. 
'alary (in Rs.) LE 7, геену wd f л ) FX 
10 
15 
20 
25 
30 
5/Х=380 
; = ух _380_ 
-. Average bonus paid per employee= у = 20 =Кз. 19 


Example 5.11. The mean salary paid to 1,000 employees of 
an establishment was found to be Re. 180.40. Later on, after disburse- 
ment of salary, tt was discovered that the salary of two employees was 
wrongly entered as Re. 297 and 165. Their correct salaries were 
Rs. 197 and 185. Find the correct Arithmetic Mean. 


Solution. Let the variable Y denote the salary (in rupees) of 
an employee Then we are given : 


х= =X —18040 = ®ї=1,80,400 209) 
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Thus the total salary disbursed to all the employees in the 
establishment is Rs. 1,80,400. After incorporating the corrections 
we have ; 

Corrected ZX—180400— (sam of wrong salaries) 
+(sum of correct salaries) 
= 180400 — (297-1-165) - (197-185) 
=180400—462+382 
=180320 
180320 


Corrected mean salary = 700 =Rs. 180.32 


Example 5:12. The table below show the number of skilled 
and unskilled workers in two small communities, together with their 
average hourly wages : 


Ram Nagar 


Shyam Nagar 
category 
Number Wage per Number Wage per 
hour hour 
Skilled 150 Rs, 1°80 
Unskilled 850 Р, 


Determine the average hourly wage for each community, Also 
give reasons why the results show that the average hourly wage in 
Shyam Nagar exceeds the hourly wage in Ram Nagar even though in 
Shyam Nagar the average hourly wage of both categories of workers is 
lower. (Delhi U. B. Com. (Hons.) 1971] 

Solution, Let n, and п denote the number, and X, and X, 
denote the wages (in rupees) per hour of the skilled and unskilled 
workers respectively in the community. Let X be the mean wages 
of all the workers in the community. 

Ram Nagar. We have: 

0,2150, X,— Ks. 1.80 
7,—850, X,—Rs. 1.30 


Mo Gom kmX, 150 L80--850x1.30 
Ye "mnm 7 150-+850 
2704-1105 1375 
= 


1000 =io00 R* 1.375 


Shyam Nagar, We have : 
0,472350, X,—1.75 
737—650, X,=1-25 


Worker 
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Z= MXit MX, —350x1.754-650x1.25 


n "eb 350 +650 
— 812 50+812 50 1425 _ 
EO COENTODUT. wt E000 =Rs. 1.425. 


Thus we see that the average wage per hour for all the workers 
combined is higher in Shyam Nagar than in Ram Nagar, although 
the average hourly wages of both types of workers are lower in 
Shyam Nagar. The reasons for this somewhat strange looking result 
may be assigned as follows : 


The difference in the average hourly wages in Ram Nagar 
and Shyam Nagar : 


(a) For skilled workers is 1.80—1.75— Rs. 0.05. 
(b) For unskilled workers is 1.30—1:25—Rs. 0.05. 


Thus, although the difference in the wages of skilled and un- 
skilled workers in both the communities is same viz., Rs. 0.5 the 
number of skilled workers getting relatively higher wages than the 
unskilled workers is much more in Shyam Nagar than in Ram 
Nagar and the number of unskilled workers getting relatively less 
wages is much less in Shyam Nagar than in Ram Nagar. In fact, 
the ratio of skilled workers to unskilled workers in Ram Nagar is 
e. 850 Фе.,3:17 while in Shyam Nagar, it is 350:650 ie., 

2118. 


Example 5.13. The mean of marks in Statistics of 100 students 
in a class was 72, The mean of marks of boys was 75, while their 
number was 70. Find out the mean marks of girls in the class. 

(Delhi U. B. Com. 1983) 

Solution. In the usual notations we are given : 

n,—70, x,—75; n,--n,—100, ¥=72 
®==100—-70=30. We want X,. 


We have : z= Fitna X. 
mitn 
_ 70x 75--30 ¥, 
Йй Ет 
= 72x 100=5250-+30 z, 
> 30 ¥,=7200—5250=1950 
2.1950 
=F R= = 65. 


Hence the mean of marks of girls in the class is 65, 


Example 514. The average monthly wage of all workers in 

a factory is Rs. 444]-. If the average wages paid to male and female 

workers are Rs. 480/- and Rs 360]- respectively, find the percentage of 
male and female workers employed by the factory. 

[Delhi U. В.А. (Econ, Hons. 19821 
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Solution. Letn, and, denote respectively the number of 
male and female workers in the factory and, X, and X, denote res- 
pectively their average salary (in Rupees). Let Y denote the average 
salary of all the workers in the factory, Then we are given that : 


X,—480, Y,—360 and X—444 


We have : | 
Xn, X, 
Pa Bs rm, | 

ndn. 
- (nitne) X2 n, X,-+n X, 
> 444 (n, n,) —480 n, +360 n, 
> 
> 


(480—444) n,=(444—360) No 
E ТВЕА 
$6m—84m = == 3 


Hence the male workers in the factory are 


7 М o; 
zyz X100—5 x 100=70% 
and the female workers in the factory are 
3 3 ali 
713 x 100= 5 x 100=30% 
5.5. Weighted Arithmetic Mean. The formulae discussed 
so far in (5.1) to (5.6) for computing the arithmetic mean are based 
гоп the assumption that all the items in the distribution are of equal 
importanee: However, in practice, we might come across situations 
where the relative importance of all the items of the distribution is 
not same. If some items in a distribution are more important than 
others, then this point must be borne in mind, in order that average 
computed is representative of the distribution, In such cases, 
proper weightage is to be given to various items—the weights atta- 
ched to each item being proportional to the importance of the item 
in the distribution, For example, if we want to have an idea of 
the change in cost of living of a certain group of people, then the 
simple mean of the prices of the commodities consumed by them 
will not do, since all the commodities are not equally important, 
eg., wheat, rice, pulses, housing, fuel and lighting are more impor- 
tant than cigarettes, tea, confectionery, cosmetics etc. 


Let Wy, Wayan, Wn be the weights attached to variable values 
Xi Xg,..., Xn respectively. Then the weighted arithmetic mean, 
usually denoted by X, is given һу: 
Xo Xe Wey... WaXn_ УХ NO 
РЕР C ZW 
This is precisely same as formula (5.2) with f replaced by W. 
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In case of frequency distribution, if f,, f,,..., fa are the fre- 
quencies of the variable values Ху, Х,,..., Xn respectively then the 
weighted arithmetic mean is given by ; 


ў GE Wel X). WalfoXn) EW(fX) (5.12) 
Р-Р, +... а тыр БУ А s 
where Wi, W,,..., Ws are the respective weights of X,, X,,..., Xn. 
Example 5:15. А candidate o tained the following percentages 
$n an examination : Eng ish 60 ; Hindi 75 ; Mathematics 62 3 Physica 


59 ; Chemistry 55. Find the candidate's weighted arithmetic mean if 
weights 1, 2, 1, 3, 3 respectively are allotted to the subjects. 


| Solution. Let the variable Х denote the percentage of marks 
in the examination. 


COMPUTATION OF WEIGHTED MEAN 


Subject Marks (%) (X) Weights (W) 


English 
Hindi 
Mathematics 
Physics 
Chemistry, 


-'. Weighted Arithmetic Mean (in %) 
— SWX _ 615 * 
33K... ee 
Example 5.16. Comment on the performance of the students 
in three universities given below using simple and weighted averages : 


University Calcutta 


% of pass| No. of | % of pass| No. of \% ofpass| No. of 
students students students 
(in '00s) (їп '00s) (in ’00s) 


[C.A. (Intermediate) May 1970] 
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COMPUTATION OF SIMPLE AND WEIGHTED AVERAGES 


Uni- 
versity 
ТА 


Calcutta | Madras 
Y EL 


Pass | No. of Pass | No. of | 
% | students | 96 | students ^ 
Sp (Xa) (in "005)| < ІХ | (in *005) | 73 
CELA (OE w) | x 
| tt t 
М.А 82 2 164 | 81 24 162 
М. Сот. 76 3 228 76 as 266 
B.A 73 6 438 | 74 45 333 
B. Com. 76 7 532 | 58 2 116 
В. Sc. 65 3 195, 70 7 490 
M. Sc 60 7 420 | 73 2 146 
Total 1451 hiss 28 ftc 432 21 1513 


University Simple Average Weighted A verage 
IX. 432 — BWX _ 1451 _„„ 
Bombay See oe =72 DTP D 72:55 
Ly, 432 IW, X, 1977 > 
Calcutta ee Eos 72 a ec 77061 
IX, 432 УР, Х, ois Sani 
Madras б^ е^? УЙ, 3; 7729 


On the basis of the simple arithmetic mean which comes out 
to be same for each University viz., 72, we cannot distinguish bet- 
ween the pass percentage ofthe students in the three Universities. 
However, the weighted averages show that the results are the best in 
Bombay University (which has highest weighted average of 72°55) 
followed by Madras University (which has the weighted average 
72.05), while Calcutta University shows the lowest performance. 

Example 5.17. From the results of two colleges A and B below 
state which of them is better and why ? 


College B 
Passed 


Passed | Appeared 


Appeared 


(Punjab U. B. Com. April 1983) 
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Solution 


Name of College A College B- { 
Ехат. 


Ap- Passed Pass %age Ap- Passed Pass %аге 
peared peared 
(Wa) (Xa) (Ws) (Xz) 


250 800 
300 250 —7-x100—83.33 |1000 800 {000° X 100=80 
450 950 3 
500 450 m oe 1200 950 12). x 100=79:17 
15 700 
2000 1500 =x 100=75 1000 709 FRx 100=70 


800 500 39. 100..62:5 


750 
1200 750 259 „10р—625 
? 1200 ^ 800 


2950 


4000 2950 2950 x 10073775 2020 


4000 2950 4000 * 100=73'75 


On the basis of the given information, it is not Possible to 
decide which college is better, since the criterion for ‘better college’ 
is not defined, Let us try to solve this problem by taking the ‘Higher 
Pass Percentage’ as criterion for ‘better college’. 


From the above table, we find that the Pass percentage in 
M.A., M. Com. and В.А. is better in college A than in college B 
and in B, Com. the pass percentage is same іп both the colleges, 
The simple arithmetic mean of pass percentages in all the four 
courses is : 


Pie 83 324904 75+ 62.50 2 2M 2771 
TONS SO MATE 7062.50 ES BM —72.92 


Since the mean pass percentage is higher for college A than 
for college B, we are tempted to conclude that college A is better 
than college B. However, this conclusion is not valid since the 
average pass percentage is affected by the number of students 
appearing in the examination in different courses, An approrriate 
average would be the weighted average of these pass percentages in 
+ different courses, the corresponding weights being the number 
of students appearing in the examination. The weighted means 
are: 

F DWaXa 
& = — t+ 
UM ZW, 
., Total number of students passed in college ^ x 100 
.. Total number of students appeared in collage A 
2950 
4000 


X 1002 73.75 


= 
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ZWax 
feme yy 
___ Tota! number of students passed in college B x100 
Total number of students appeared in college B 


2950 
4000 


On comparing the weighted means, we conclude that both the 
colleges A and B are equally good on the basis of the criterion of 
higher pass percentage for all the students taken together, 


EXERCISE 5.1 


1. What is a statistical average ? What are the desirable properties for 
ап average to possess? Mention different types of averages and state why the 
arithmetic mean is the most commonly used amongst them. 

(Kurukshetra U., M.B.A. 1977) 

2. What are ‘measures of location’? In what circumstances would you 
consider them as the most suitable measures for describing the central tendency 
of a frequency distribution ? [Delhi U. B. Com. (Hons.) 1981] 

3. (a) Explain the properties of a good average. In the light of these pro- 
perties which average do you think is the best and why ? 

(Delhi U,, M.B.A., 1974, 76) 

3.(b) What do you mean by an ‘Average’ in Statisties. Mention the 
essentials of a good average, (Mysore U, В, Com. Nov. 1981) 


К a What do you understand by arithmetic mean ? Discuss its merits and 
its, 


X100—73.75 


demer 
5. The figure of2:2 children per adult female was felt to be in some 


respects absurd and the Royal Commission suggested that the middle class be 
NES to increase the average to a rounder and more convenient number.” 


Commenting on the above statement, discuss the limitations of the arith- 
ране average. Also point out the characteristics of a good measure of central 
ency. 


data: 5° C?leulate the average bonus paid per member from the following 


Bonus (in Rs.): 50 60 70 80 90 100 110 
No. of persens: 1 3 5 7 6 2 1 
Ans. Rs. 79.16, 


77. Calculate the Mean of the following frequency distribution relating 
to thé marks secured by students in Statistics : 


Marks No. of Students Marks No. of Students 
0— 5 1 40—45 20 
5—10 6 45—50 25 
10-15 8 50—55 12 
15—20 “9 55—60 7 
20—25 п 60-65 6 
25—30 10 65—70 5 
30—35 10 70—75 4 
35—40 17 75—80 1 
Ans. 39°37 marks, (Delhi U. В. Com, 1976) 


Tae 
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8. The following table gives the Population of males at different age 
groups of U.K. and India at the time of the census of 1931. Compare the aver- 
age of males in the two countries, 


Age group U.K. India 
(years) (in lakhs) (in lakhs) 


above 60 


Ans. Average age of males in U.K. is 29°62 years and in India it is 25°23 
years, 


9. (a) Peter travelled by car for 4 days. He drove 10 hours each day. 
He drove : first day at the rate of 45 km. per hour, second day at the rate of 
40 km. per hour, third day at the rate of 38 Кт, per hour and fourth day at the. 


rate of 37 km. per hour. What was his aver: e speed ? 
ы [Delhi U. B. Сот. (Hons.) 1973] 


Ans. 40 km. p.h. 
(b) Typist A can type a letter in five mininutes, typist B in 10 minutes 


and typist C in fifteen minutes. What is the average number of letters typed per 
hour per typist ? (Delhi Uni. В. Com. 1973) 


Ans. Reqd. average=(12+6+-4)/3=7°33. 
(c) A taxi ride in Delhi costs one rupee for the first kilometre and sixty 


paise for each additional kilometre. The cost for each kilometre is incurred at 
the beginning of the kilometre, so that the rider pays for a whole kilometre. 


What is the average cost for 2 = kilometres ? 


(Guru Nanak Dev, Uni., B. Com. 1977) 
Ans. Average cost for 2 i kms,=(100+60+60 x dj Paise=80 Р; 


10, The mean weight of a student in a group of students is 119 Ibs. 
The individual weights of five of them are 115, 109, 129, 117 and 114 Ibs 
What is the weight of the sixth student ? (Bangalore Uni. B. Com., May 1979. 


Ans. 130 Ibs. 
11. Find the average marks of a student from the following table ; 


Marks below : 10 20 30 40 50 
No. ofstudent: 25 40 55 60 75 
[Osmania U. B. Com. (Hons.) April 1983] 
Ans, %=20°5. 
(Hint: Take class intervals 0—9, 10—19,...etc.) 
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12. (a) Twelve persons gambled on a certain night. Seven of them lost 
at an average rate of Rs 10°50 while the remaining five gained at an average 
of Rs, 13:00. Is the information given above correct ? If not, why ? 


U.C.W.A. (Intermediate) Dec. 1981] 


(b) Goals scored by a hockey team in successive matches are 8515.4,-2, 
4,0, 5, 5and 3. What is the number of goals, the team must score in 10th 
match in order that the average comes to 4 goals per match. 

(Delhi U. B. Com. 1979) 

Ans, 5. 

13. (a) The following are the monthly salaries in rupees of 30 employees 
of a firm: 

91. 139, 126, 119, 100, 87, 65, 77, 99, 95, 108, 127, 86, 148, 116, 76, 69, 88, 
112, 118, 89, 116, 97, 105, 95, 80, 86, 106, 93, 135. 

The firm gave bonuses of Rs. 10, 15, 20, 25, 30,35, 40, 45 and 50 to 
employees in the respective salary groups : exceeding 60 but not exceeding 70, 
exceeding 70 but not exceeding 80 and so on up to exceeding 140 but not exceed- 


ing 150. Construct a frequency distribution and find out the total bonus paid 
per employee, (Guru Nanak Dev, U. B. Com. Sept. 1981) 


Ans, Average bonus—Rs. 27.50. 

14. Find the class intervals if the arithmetic mean of the following 
distribution is 33 and assumed mean 35 : 

Step deviation -3 -2 —1 0 +1 +2 

Frequency 5 10 25 30 20 10 

Ang. 0—10, 10—20, 20—30, 30—40, 40—50, 50—60. 

15. (a) A certain number of salesmen were appointed in different terri- 
tories and the following data were compiled from their sales reports : 


Sales $900 Rs.) 4—8 8—12 12—16 16—20 20—24 24—28 28—32 32—36 36—4 0 
No. of salesmen: 11 13 16 14 — 9 17 6 4 
If the average sales is believed to be Rs. 19,920, find the missing infor- 
mation, 
Ans. Missing Frequency =10, 
ats En From the following data, find the missing frequency when mean 
Size 210 12 14 16 18 20 
Frequency: 3 yf T 20 8 5 
(Bangalore U. B. Com. April 1978) 
Ans. 12. 


16. For the two frequency distributions given below the mean calculated 
from the first was 25'4 and that from the second was 32:5. Find the values of 
x and y. 


Class Distribution I Distribution II 

Frequency Frequency 
10—20 20 4 
20—30 15 8 
30—40 10 4 
40—50 x 2x 
50—60 y 


» 
[ICWA (Intermediate) June 1982] 


| 
3 
: 
| 
} 
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А. 17 (а) The mean of 200 items was 50, Later on it was discovered that 
Be ate were wrongly read as 92 and 8 instead of 192 and 88. Find out the 
n. 


Ans. 50.9. 


(6) The average income for a group of 50 persons working in a fact 
was calculated to be Rs. 169. It was later discovered that one figure wal MIS 
read as 134 instead of the correct value 143. Calculate the correct average 
income. 


Ans, Rs. 169-08. 


18, (a) The mean marks got by 300 students in the subject of Statistics 
are 45. The mean of the top 100 of them was found to be 70 and the mean of 
the last 100 was known to be 20. What is the mean of the remaining 100 
students ? 

Ans. 45. 

(b) The mean wage of 100 labourers working in a factory, running two 


Shifts of 60 and 40 workers respectively, is Rs. 38. The mean wage of 60 
labourers working in the morning shift is Rs. 40. Find the mean wage of 40 


labourers working in the evening shift. [Delhi Uni. В. Com., (Нопз.) 1972] 
Ans. Rs.35. 
Бе] 19. The pass result of fifty students who took ир a class test is given 
elow : 
Marks : 4 5 6 7 8 9 
No. of students : 8 10 9 6 4 3 


If the average marks for all the fifty students were 5.16, find out the 
average marks of the students who failed. 
(Mysore Uni, В. Com. 1977 ; Guru Nanak Dev Uni. В, Сот. 1973) 


Ans, 2.1, 


„20. (a) Calculate the average daily wages for the workers of two- 
factories, 
Factory A Factory B 
a Of wage earners 350 ш 
verage dai! У З 4 
PONE n2 (Osmania University B. Com, ILI, April 1984) 


i. . pipes. The mean monthly 
(b) X Ltd. has 5 plants producing P.V.C. pi The peers 


грез, 
Production of 4 plants for 1982 was 73,000 feet and that 


85,000 feet. Calculate the average production of the five plants. 
TENE Er (Delhi U. B. Com. III, 1984) 


21.(а) The mean weight of 150 students in a certain class is 60 kgms 
a ies weight of Bore: in the class is 70 kgms ance of girls is 55 kgms 
t irls in the class. 
ind the number of boys and the number of girls iik class. Cam. (Hons) 1976] 
Ans, Boys=50, Girls=100. 
(b) Average monthly production of a certain factory in the first 9 months 
is 2584 units and for the remaining 3 months it is 2416 units. Calculate average 
monthly production for the year. [Delhi U. B. Com. (Hons) 1979] 


Ans. 2542 units. 


(c) The mean monthly salary paid to all employees in a certain company 
is Rs. 600. The mean monthly salaries paid to the male and female employees 
were Rs. 620 and Rs. 520 respectively. Obtain the percentage of male to female 
employees in the company. [1.C.W.A. (Intermediate) Dec. 1981] 


Ans. Males: 80%, Females : 20%. 


Dm 
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22. (a) There are three Sections in B. Com. 1st Year in a certain college, 
The number of students in each Section and the average marks obtained Mtem 
in the Statistics paper in the annuaí examination are as follows : 


Section Average inarks in Statistics No. of Students 
А 75 50 
в 60 60 
c 55 50 


Find the average marks obtained by the Students of all the sections taken 
together. 


Ans. 63.125, 
(6) The mean monthly salary Paid to 77 employees in a company was 


Rs. 78. The mean salary of 32 of them was Rs. 75 and that of other 25 was 
Rs. 82, What was the mean salary of the remaining ? 


(Bombay Uni. В, Com., 1977) 
Ans. Rs. 77.80, 


23 Define the weighted arithmetic mean of a set of numbers, 


A candidate obtains the following Percentages in an examination : 
Sanskrit 75 ; Mathematics 84 ; Economics 56 ; English 78 ; Politics 97; History 
54; Geography 47, it „18 agreed to give double weights to marks in English, 
Mathematics and Sanskrit. What із the weighted and unweighted mean ? 


Ans. Xw 368.8; 64.43, 

24. A contractor employs three types of workers—male, female and 
Children. To a male worker he pays Rs. 16 Per day, to a female worker 
Rs. 13 per day and to a child worker Rs. 10per day. What is the average wage 


рег day paid by the Contractor if the number of males, females and children is 
20, 15 and 5 Tespectively ? 


Ans. Rs, 14.12, 
25. Compute the weighted arithmetic mean ofthe index number from 
the data below : 


Group Index Мо, Weight 

Food 125 7 

Clothing 133 5 

Fuel and Light 141 4 

House Rent 173 1 

Miscellaneous 182 3 
Ans, 141.15 


26. The following table gives the distribution of 100 accidents during 
“Seven days of the week Of a given month. During the particular month A PA 
are 5 Mondays, Tuesdays and Wednesdays and only four each of the other days. 
Calculate the average number of accidents per day, 


Days No. of Accidents Days No. of Accidents 
Sunday 26 Thursday 8 
Monday 16 Friday 10 
Tuesday 12 Saturday 18 
Wednesday 10 


Ans. 1413—14 
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ү 27. Calculate simple and weighted ari i 
ing dita san CRM E E ghted arithmetic averages from the follo- 


Designation Monthly salary (in Rs.) Strength of the cadre 
Class I Officers 1,500 10 

Class II Officers 800 20 
Subordinate staff 500 70 

Clerical staff 250 100 

Lower staff 100 T5010 


(Bombay Uni. B.Com. 1977) 
Ans. Simple Arithmetic Mean=Rs. 630, 
Weighted Arithmetic Mean=Rs. 302.57. 


28. Comment on the performance of the: students of three Universities 
given below using an appropriate average : ` 


SHIVAJI DAYANAND 


Course of Study| % of No. of % of No, of 
Pass students | Pass students 


University 


%of No.of 
Pass students 


Kamraj (Madurai) Uni. B. Com., Oct. 1978] 


29. (а) From the results of two colleges А and B given below, state 
which of them is better and why ? 


Name of Exam. College A College B 
Appeared Passed Appeared Passed 
M.A. 60 50 200 160 
M. Com. 100 90 240 190 
B.A. 400 300 200 140 
B. Com. 240 150 160 100 
Total 800 590 800 590 


(Punjab U. B. Cem. Sept. 1983) 


30. (а) A travelling salesman made five trips in two months. The record 
of sales is given below : : 
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Trip No. of days Volume of sales Sales per day 
(in Rs.) (n Rs.) 
a MUS 
1 5 3,000 600 
2 4 1,600 400 
3 3 1,500 500 
4 7 3,500 500 
5 6 4,200 700 
ы алы Á—— —Á— 
4 25 13,800 2,700 
E e a M LRL 


The sales manager criticised the salesman's performance as not very 
good since his mean daily sales were only Rs. 540, (2700/5). The salesman 
called this an unfair statement for his daily mean sales were as high as Rs. 552 
(13800/25). What does each average mean here ? Which average seems to be 
more appropriate in this case ? (Delhi Uni. D.M.M., 1975) 


Ans, The Manager obtained the simple arithmetic mean ofthe sales 
рег day while salesman obtained the weighted arithmetic mean, The latter 
(weigted average) seems to be more appropriate. 


5'6. Median. In the words of L.R. Connor $ 


“The median is that value of the variable which divides the 
group in two equal parts, one part comprising all the values greater 
and the other, all values leas than median",Thus median of a distri- 
bution may defined as that value of the variable which exceeds and 
is exceeded by the same number of observations i.e., it is the value 
such that the number of observations above it is equal to the number 
of observations below it. Thus, we see that as against arithmetic 
mean which is based on all the items of the distribution, the median 
is only positional average d.e., its value depends on the position 
occupied by a value in the frequency distribution. 


Calculation of Median, 


Case (Г) Ungrowped Data. If the number of observations is 
odd, then the median is the middle value after the observations have 
been arranged in ascending or descending order of magnitude. For 
example, the median of 5 observations 35, 12, 40, 8, 60 i e., 8, 12, 
35, 40, 60, is 35. 


In case of even number of observations median is obtained as 
the arithmetic mean of the two middle observations after they are 
arranged in ascending or descending order of magnitude. Thus, if 
One more observation, say, 50 is added to the above five observa- 
tions then the six observations in ascending order of magnitude 
are : 8,12, 35, 40, 50, 60, Thus, 
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Median=Arithmetic mean of two middle terms 

=}(35+40)=37.5 

Remarks. It should be clearly understood that in case of 
even number of observations, in fact, any value lying between the 
two middle values can serve as a median but it is a convention to 
e oe median by taking the arithmetic mean of the two middle 
values. 

Case (II) Frequency Distribution. In case of frequency distri- 
bution where the variable takes the values X;, X;,..., Xn with respec- 
tive frequencies fi, fa... fa with Zf—N, total frequency, median is 
the size of the (N--1)/2th item or observation. In this case the use 
of cumulative frequency (c. f.) distribution facilitates the calcula- 
tions, The steps involved are : 

($) Prepare the ‘less than’ cumulative frequencydistribution, 

(ii) Find N/2 

(iii) See the c.f. just greater than N/2 

(iv) The corresponding value of the variable gives median, 

The following example will illustrate the method. 

Example 5.18, Eight coins were tossed together and the 
number of heads (X) resulting was noted. The operation was repeated 
256 times and the frequency distribution of the number of heads is 
given below : 

No. ofheads(X): 0 1 2 8 4 5 6 T7 8 


Frequency (f) 1 95.,28,:7.89 1720 0 08922957 UE 
Calculate median. 
Solution. 


COMPUTATION OF MEDIAN 


2194+29 —248 


2484+ 7-255 
255-- 1-256 
Here N=256, => d.i 


The cummulative frequency (c f.) just greater than 128 is 167 
and the value of X corresponding to 167 із 4, Hence median 
number of heads is 4. 
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Case (III). Continuous Frequency Distribution, As 


before, median is the size (value) of the (N+1)/2th observation, 
Steps involved for its computation are : 
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(i) Prepare ‘less than’ comulative frequency (c.f) distri- 
bution, 
(ii) Find муо. 
(tii) See c.f. just greater than N/2, 


(iv) The corresponding class contains the median value and is 
called the median class 


The value of median is now obtained by using the inter- 
polation formula : 


Median-I4- 7(3--0 ) ++-(5.13) 
where 
1 is the lower limit of the median class, 
f is the frequency of the median class, 
ћ is the magnitude or width of the median class, 
N —f, is the total frequency, 
ач C is the cumulative frequency of the class preceding the median 
ass, 


Remarks 1. The interpolation formula (5.13) is based on the 
following assumptions : 


(i) The distribution of the variable under consideration is 
continuous with exclusive type classes without any gaps. 


(i) There is an orderly and even distribution of observations 
within each class. 


However, if the data are piven as a grouped frequency distri- 
bution where classes are not continuous, then it must be converted 
into a continuous frequency distribution before applying the 
formula, This adjustment will affect only the value of Lin (5.13). 


2. Median will be abbreviated by the symbol Md. 


3. The sum of absolute deviations of a given set of observations 
is minimum when taken from median. By absolute deviation we mean 
the deviation after ignoring the algebraic sign, Thus, if we take 

€ deviation of the given values of the variable X from an assumed 
mean Athen X—4 may be positive or negative but its absolute 
value denoted by | Х—А |,readas(Y—4) modulus or (X—4A) 
mod is always positive and we have 


eo ЕЕ rrr ЕЕ 
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Хр X-A | >>] | X—Ma 
i.e., the sum of the absolute deviations about any arbitrary point A 
is always greater than. the sum of the absolute deviations about the 
median. For further discussion, see Mean Deviation in chapter 6 
on Dispersion. 


Merits and Demerits of Median 
Merits. (i) It is rigidly defined. 


(ii) Median is easy to understand and easy to calculate for a 
non-mathematical person. 


(dii) Since median is a positsonal average, it is not affected at 
all by extreme observations and as such is very useful in the case of 
skewed distributions (c.f. Chapter 7), J-shaped or inverted J-shap- 
ed distributions (c.f. chapter 4) such as the distribution of wages, 
incomes and wealth. So in case of extreme observations, median is 
a better average to use than the arithmetic mean since the latter 
gives a distorted picture of the distribution. 


(iv) Median can be computed while dealing with a distribution 
with open end classes. 

(v) Median can sometimes be located by simple inspection and 
can also be computed graphically (See Ogive discussed in § 5.6.2.) 


(vi) Median is the only average to be used while dealing with 
qualitative characteristics which cannot be measured quantita- 
tively but can still be arranged in ascending or descending order of 
magnitude eg., to find the average intelligence, average beauty, 
average honesty etc., among a group of people. 


Demerits. (i) In case of even number of observations for an 
ungrouped data, median cannot be determined exactly. We merely 
estimate it as the arithmetic mean of the two middle terms, In fact 
any value lying between the two middle observation cau serve the 
purpose of median. 

(ii) Median, being a positional average, is not based on each 
and every item of the distribution. It depends on all the observa- 
tions only to the extent whether they are smaller than or greater 
than it ; the exact magnitude of the observations being immaterial. 
Let us consider a simple example. The median value of 

35, 12, 8, 40 and 60 i.e., 8, 12, 35, 40, 60 
is 35. Now if we replace the values 8 and 12 by any two values 
which are less than 35 and the values 40 and 60 by any two values 
greater than 35 the median is unaffected. This property is sometimes 
described by saying that median is sensitive. 

(iii) Median is not suitable for further mathematical treatment 
i.e., given the sizes and the median values of different groups we 
cannot compute the median of the combined group. 
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(iv) Median is relatively less stable than mean, particularly for 
small samples since it is affected more by fluctuations of sampling as 


Example 5.19. In a batch of 15 students, 5 students failed in 
a test. The marks of 10 students who Passed were 9, 6, 7, 8, 8, 9, 6, 
5, 4,7. What was the median of all the 15 students ? 


Solution. The marks of 10 students who passed when arran- 
ged in ascending order of magnitude are : 
4, 5, 6, 6, 7, 7, 8, 8,9, 9. 
Since the five students who failed must have scored less than 


` 4 marks, the marks of 15 students when arranged in ascending 
order are : 


эо 4, 5, 6, 6, 7, 7, 8, 8, 9, 9. (7 
Неге V=15, Hence the median value is the middle value viz. 
8th value in the series (*). Hence median is 6. 


í Example 5.20. The following table shows the age distribution 
of persons in a Particular region, 


No. о, „ of persons 
qe) e о Ко орны 
Below 10 2 Below 50 14 
» 20 6 » 60 15 
» 30 9 „ 70 155 
» 40 у 12 70 and over 15.6 


(i) Find the median age, 
o ted is the median a more suitable measure of central ten- 
than the mean in this case ? 

[Delhi Uni. B.A. Econ, (Hons.) 1971] 
Solution. (i) First of all we shall convert the given distribu- 


tion into the continuous frequency distribution as given below and 
then compute the median. 


COMPUTATION OF MEDIAN 


Jf. 


Number of persons 
in '000 ( f) 


(m in 


15:5—15=0.5 
70 апі оуег 15:6—15:5—0.1 


T Еи 


f 
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6 
Неге ve P78, Cumulative frequency (c.f.) greater than 
78 is 9. Thus the corresponding class 20—30 is the median class, 
Hence using the median formula (5.13) we get: 


Median=20-+ *( 78—5 ) 


=20+ 3- x28—20--5x 14—27 


Hence median age is 27 years. 

(ii) In this case median is a more suitable measure of central 
tendency than mean because the last class viz., 70 and over is open 
end class and as such arithmetic mean being a calculated average 
cannot be computed. 

Example 5'21. The frequency distribution of weight in grama of 
mangoes of a given variety is given below, Calculate the arithmetic 
mean and the median. 

Weight in Number of Weight in Number of 


grams mangoes grams mangoes 
410—419 14 450—459 45 
420—429 20 460 —469 18 
430—439 42 470—479 { 
440—449 54 


[Delhi О, В.А. (Econ. Hons.) 1981] 


Solution. Since the interpolation formula for median is 
based on continuous frequency distribution we shall first convert 
the given inclusive class interval series into exclusive class interval 
Series. 


CALCULATIONS FOR MEAN AND MEDIAN 
gh SO Жери ДИ шын. A ЖАДЫ 


Weight in grams No. of Mid-value а-^—#4#5 fd (Less than) 
Mangoes(f) (Х) 70 of. 
ee I a ЫН ea 
409°5—419°5 14 414:5 -3 -4 M 
419:5—429:5 20 424:5 .-2 —40 34 
429:5—439:5 42 434:5 -1 —42 76 
439-5—449:5 54 444°5 0 0 130 
449:5—459:5 45 454:5 1 45 175 
459:5—469:5 18 464:5 2 36 193 
469:5—479:5 7 474:5 3 21 200 
PAR SETA E E A. E - 
Total 5/=200 5/4 
=N =—12 
AZfd 10x (—12) 


7444.5 —0.6—443.9 gms. 
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Here N/2—100. Thec f. just greater than 100 is 130. 
Hence the Corresponding class 439.5—449.5 is the median 


ма (e) 


10 Т 10х24 
7489.5 57 (100-76 )= 439.54- ELTE 
77439.50-1-4.44—443.9 1 gms. 
Example 5.22, Find the missing frequency from the following 


distribution of sales of shops, given that the median sale of shops 
ts Rs, 2,400, 


class. 


Sale in hundred Rs.: 0—10 10—20 20—30 30—40 40—50 
No. of shops Н 5 25 — 18 7 

(Bombay U. B. Com. May 1980) 
Solution, Let the missing frequency be ‘a’ 


CALCULATIONS FOR MEDIAN 


Sales in No. Ло, Cumulative 
hundred Ёз. 75 5n frequency (c. f.) 


, Since median sales is Rs. 2 400 (24 hundred) Rs. 20—30 is the 
median class, Using median formula 


wer (Ee) 


f 

we get 

24—204- T —80 ) 

a 2 
X L 19 (2552-60) шс 
a 2 a 

m 4a=5a—25 
= a=25, 


Hence, the missing frequency is 25. 


,, Example 523 In the frequency distribution of 100 families 
gwen below, the number of families corresponding to expenditure 
groups 20—40 and 60—80 are missing from the table. However, the 
median is known to be 50. Find the missing frequencies, 
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Expenditure : 0—20 20—40 40—60 60—80 80—100- 
No. of families: 14 ? 2? ? 15 
[Delhi Uni, B. Com. (Hons.) 1974] 


Sulution, Let the missing frequencies for the classes 20—40 
and 60—80 be f, and f, respectively. 


COMPUTATION OF MEDIAN 


Expenditure 
(in Rupees) 


41th 
ALT ff 
564- fif 


N2100256--f,4- fa 


From the above table we have : 
2f=56-+f,+fo= 100 (Given) 
E fif; 100—56—44 09) 


Since median is given to be 50, which lies in the class 40—60, 
therefore, 40—60 is the median class. Using the median formula 


ме get : 
A/N 
Menit (5-0) 
20 
> so=40-+ [ 50—(14+/) ] 
20 
> 50-40-37 [se ] 
20 
- 10— 35 (35-4 ) 
> 27=2(36—f,)=72—2f, 
> 2/,=72-27=45 
> f= T 02.523. 


[Since frequency can’t be fractional} 
Substituting in (*) we get : 
fo=44—f,=44—23=21. 
5'61. Partition Values. The values which divide the series 
into a number of equal parts are called the partition values. Thus 
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median may be regarded as a particular 


Partition value which 
divides the given data into two equal parts. 


Quartiles. The values which divide the given data into four 
equal parts are known as quartiles. Obviously there will be three 
such points Q,, Q, and Qa such that Q:XQ,XQ,, termed as the 
three quartiles. Q,, known as the lower or first quartile is the value 
which has 25% of the items of the distribution below it and conse- 
quently 75% of the items are greater than it, Incidently Q,, the 
second quartile, coincides with the median and has an equal num- 
ber of observations above it and below it, Q,, known as the upper or 
third quartile, has 75% of the observations below it and consequently 
25% of the observations above it. 


The working Principle for computing the quartiles is basically 
the same as that of computing the median. 


To compute Qi, the following steps are required : 
(i) Find N/4 where N=2f is the total frequency. 
(5) See the (less than) cumulative frequency (c.f.) just greater 
than Л/4, 


(iii) The corresponding value of X gives the value of Q,. In 
Case of continuous frequency distribution, the corresponding class 


contains Q, and the value of Q, is obtained by the interpolation 
formula ; 


Ф=+ (2 —ó ) (5.14) 


where lis the lower limit of the class containing Q,. 
J is the frequency of the class containing Q,, 
В is the magnitude of the class containing Q;, 5 
and C is the cumulative frequency (c.f.) of the class preceding 
the class containing Q,. 


Similarly to compute Q, see the (less than) c.f. just greater 
than 3N/4, The corresponding value of X gives Q,. In case of con- 
tinuous frequency distribution, the corresponding class contains Q, 
and the value of Q, is given by the formula : 


Q,—14- (38-0) (5.15) 


where [із the lower limit of the class containing Q,, 
À is the magnitude of the class containing Q,, 
f isthe frequency of the class containing ©», 
and Cis the c.f. of the class preceding the class containing 
hs 
Deciles. Deciles are the values which divide the series into 
ten equal parts. Obviously there are nine deciles D, Dz, Dg, ...,D;, 


E 
А 
Р 
| 
] 
! 
| 
| 
; 
| 
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(say), such that D,& D,«...« D,. Incidentally D; coincides with the 
median. 

The method of computing the deciles Di, ($1, 2,...,9) is the 
same as discussed for ©, and Q, To compute the th decile Di, 
(%= 1, 2,...,9) see the c.f. just greater than x . The correspond- 
ing value of X is D;. In case of continuous frequency distribution the 
corresponding class contains Di and its value is obtained by the 
formula 


h (ixN A 
Di-H- F (=F SA ), (i1, 2,...,9) -« (5.16) 


where lis the lower limit of the class containing Di, 
f is the frequency of the class containing Di, 
his the magnitude of the class containing Dy, 
and C is the c.f. of the class preceding the class containing Di 


Percentiles. Percentiles are the values which divide the series 
into 100 equal parts. Obviously, there are 99 percentiles Py. Pars.» 
P, such that P, P,«...«P,,. The ith percentile Р, (i—1, 2,...,99) 
is the value of X corresponding to c.f. just greater than 

x . ; 
ы - In case of continuous frequency distribution, the corres: 
ponding class contains P: and its value is obtained by the interpola« 
tion formula : 
(ХУ Е 
= += (-22 – =], 2,..., 99 (ОИ 
Pratt (Ans ), #12 (5.17) 
where Гіз the lower limit of the class containing Pi, 
f is the frequency of the class containing Pi, 
h is the magnitude of the class containing Рг, 
and C is the c.f. of the class preceding the class containing Ps. 


In particular, we shall have : 


Рь=@„ P—D;—Q» Py—Qy 
D, =Po Dj Py, РР,» Dy, Py. 


Remarks. Importance of partition values, Partition values, 
particularly the percentiles are specially useful in the scaling and 
ranking of test scores in psychological and educational statistics. In 
the data relating to business and economic statistics these partition 
values, specially quartiles, are useful in personnel work and produc- 
tivity ratings. 

5.6.2. Graphic Method of Locating Partition Values. The 
various partition values viz., quartiles, deciles and percentiles can be 
easily located graphically with the help of a curve called the cumu- 
lative frequency curve or Ogive. The procedure involves the following 


steps : 
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Stepe 1. Represent the given distribution in the form of a less 
than cumulative frequency distribution, 


2. Take the values of the variable (in the case of frequency 
distribution) and the class intervals (in the case of continuous fre- 
quency distribution) along the horizontal scale (X-axis) and the 
cumulative frequency along the vertical scale (Y. axis). 


3. Plot ће c.f. against the corresponding value of the vari- 
able (in the case of frequency distribution) and against the upper 
limit of the Corresponding class (in the case of continuous frequency 
distribution). 


4. The smooth curve obtained by joining the points so obtai- 
ned by means of a free-hand drawing is called ‘less than’ cumula- 
tive frequency curve or ‘less than’ ogive. 


The various partition values can be easily obtained from this 
ogive as illustrated in Example 5.36. 


More Than Ogive. In this case we form the ‘more than’ 
cumulative frequency distribution and plot it against the correspon- 
ding value of the variable or against the lower limit of the corres. 
ponding class (in case of continuous frequency distribution). The 
curve obtained on joining the Points so obtained by smooth free 
hand drawing is called ‘more than’ cumulative frequency curve or 
‘more than’ ogive. 


Remark. If we draw a perpendicular from the point of inter- 
section of the two ogives on the z-axis, the foot of the perpendicular 
gives the value of median. 


Example 5.24, Calculate the median, quartiles, 6th decide and 
70th percentile from the following data. 


Marks No, of students Marks No. of students 
Lass than 80 100 Less than 40 32 

” 70 90 5 30 20 

" 60 80 Б 20 18 

” 50 60 10 6 


(Kurukshetra U. В, Сот. Sept. 1981) 


Solution. We are given "less than’ cumulative frequency dis- 
tribution. We shall first convert it into a grouped frequency 
distribution. Since ‘marks’ is a discrete random variable taking 
only integral values, the classes аге 0—9, 10—19,..., 70—79. 
Further, since the formulae for median, quartiles and percentiles are 
based on continuous frequency distribution, we convert the dis- 
ribution into exclusive type classes with class boundaries below 9:5, 
9.5—19.5,..., 69.5— 79.5 as given in the following table. : 
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COMPUTATIONS FOR MEDIAN, QUARTILES AND 
PERCENTILES 


Frequency 
) 


Below 9 
29 


90 б 
100—90—10 69:5—-79:5 


Median, = =” =50. The c.f. just greater than 50 is 60, Hence 


the corresponding class 39:5—49:5 is the median class, 


2. Mediam=39.54+ эе (50—5)= 5p tele, 


28 
=39.50+6.43=45.93 


Hence median marks are 45.93 


Quatiles. F = 100 95 апа = 3x 100 =75, The c.f. just 


greater than 17/4 is 32. Hence the corresponding class 29.5—39.5 
contains Q, which is given Ьу: 
@=29.5-+ 5 (25-20)- 29.54- 
7229.50--4.17—33.67 
The c f. just greater than. 3N/4—75 is 80. Hence the corres- 
ponding class 49.5—59:5 contains Qs which is given Ьу: 
10 10x15 
= 49.5-+——( 75-60 )= 49. 
Q= 49.5420 (7 60) 49.504. 292 
=49 54-7.5—57:0 


., 6N _ 6x100 
6th Decide. 7 10 


Hence the corresponding class 49:5—59.5 contains D, which is given 
by: 


10x5 
12 


=60. The c.f. just greater than 60 is 80. 


D,—49.54- a; (60 —60)=49.5 


70N _ 70x100 - : А 
100 100 ^70. The c. f. just greater than 


70 is 80. Hence the corresponding class 49.5—59.5 contains Py which 
is given by : 


70th Percentile, 
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if 70N _ 
Pa-l ++ TN 0 ) 


10 
=49.5+ 3o (70-60) 49.54- 


‚ =49-54+5=54.5 

Example 5.25. Comment on the following statement : 

“The median of a distribution és N/2, the lower quartile is N/4 
and the upper quartile in 3N/4’’ (Here N denotes the total frequency). 

L.C. W.A. (Intermediate ) Рес. 1974] 

Solution. The statement is wrong. The median of a distribu- 
tion is not N/2 but it is the value of the variable Y which divides the 
distribution into two equal parts é.e., median is the value of the 
variable X such that N/2 Vie. 50%) of ‘the observations are less than 
it and N/2 observations exceed it. The lower quartile 0, is not N/4 
butitisthe value of the variable such that N/A (i.e. 25%) of the 
observations are less than Q,. Similarly the upper quartile Q, is not 
3N/4 but it is the value of the variable such that 3N/4 (1.6.7596) of 
the observations are less than it, 


Example 5.26. The following are the marks obtained by the 
students in Statistics : 


Marks Number of students Marks Number of students. 


10х10 
20 


10 marks or leas 4 40 marke or less 40 
AD ну 10 BOE oo, 47 
Ly t 30 ШИ; 2 50 


Darw a curve on the graph paper and show therein : 
(i) The range of marks obtained by middle 80% of the students. 
(#) The median, 
Also verify your results by direct formula caloulations. 
[Delhi Uni. B. Com. (Hons) 1975] 


Solution. The above data can be arranged in the form of a 
continuous frequency distribution as given in the following table : 


20 


4 4 
10— 4= 6 10 
30—10—20 30 
40—30—10 40 
471—402 7 47 
50—47= 3 N=3f=50 


Less than Ogive. Plot the less than c.f, against the _corres- 
Ponding value of the variable in the orginal table (or against the 
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upper limit of the corresponding class in the above table) and join 
these points by a smooth free hand curve to obtain ogive. At the 


N 
frequency -,- =25, (along the Y-axis) draw a line parallel to xsaxis 


meeting the ogive at point P. Draw PM perpendicular to the x- 
axis. Then OM —27'5, is the median marks, 


OGIVE 


Ogive 
(less than) 


Бо арр)  Md=275 — Pyy-47 Capp) 
Fig. 5.1. 


The range of the marks obtained by the middle 80% of the 
students is given by Р-Р. To find Р, and Pj, graphically, at 


the frequency jag N45 and i N=5, draw lines parallel to 


the x-axis meeting the (less than) ogive at Q and R respectively, ` 
Draw QN and RL prependicular to the z-axis. Then 
Poo=ON=47 (app.) and P4,50L—11.7 (app.) 
«`. Required range of marks = Py) —P,,247 —11.7==35.3 


Values by direct calculations, 
Median. Here T 725. The c.f. just greater than 25 is 30. 


Thus the corresponding class 20—30 is the median class. Using 
median formula, we get 
а 10 ү 50 us L 
Median—204- UC 10 )=20+ 9-Х 15 
=204-7.5=27.5 


10 10 
Pio and Ps. 7100 N- Tog Х50=5 
The c.f. greater than 5 із 10, Hence P, lies in the correspon- 


ling class 10—20. 
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n Py=10+ 25 ( 5—4)=10+ AO 104 167-1167. 
N Bm 0 x 50—45 Th ter than 45 is 47 
ом -i00 = 100 =45. The c.f. greater than 45 is 47, 
Hence the corresponding class 40—50 contains Po and 
^ : 
Pu=40+ -P 45-40 )esos- 105 540.57. 14-4744 


Hence the range of the marks obtained by the middle 8095 of 
the students is 

Р»— P3447 47.14—11.67— 35.47 

Example 527, From the following data calculate the percen- 
tage of workers getting wages : 

(a) more than Rs, 44. 

(b) between Rs. 22 and Rs, 58. 

(c) Find Q, and Qs. 
Wages 
(Es) 0—10 10—20 20—30 30— 40 40—50 50—60 60—70 70—80 
No. of 

70 55 85 80 

[C.A. (Intermediate) May 1976] 


Solution. Here N— Total number of workers=500. On the 
assumption that the frequencies are uniformly distributed over the 
entire interval, the number of persons with desired wages are com- 
puted below : 


(a) Number of persons with wages more than Rs. 44 is : 


— x70 }+55+85+30 


=42+55+35+30= 162. 


Hence the percentage of workers getting wages over Rs. 44 is: 


162 T 
5007 X 100=32.4%. 


workers 20 45 85 160 


Я (b) Number of workers with wages between Rs. 22 and Rs. 58 
is given by : 


on -50 
DEL ygs |}+160+70+ [ *z x55 ] 


10 
— 8x85 8x55 
= Jo +160-++70+ 10 
=68--160--70--44= 342, 


Hence the percentage of workers getting wages between Ёз. 22 
and Rs. 58 is : 
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342 


7500 X100 =68°4% 
3N 


N 
()-——125 and rae db, 


COMPUTATION OF Q, AND о, 


МОРО " 


0—10 
10 —20 


(Less than) c.f. 


? N 
The c.f. just greater than] =125 is 150 and, therefore, the 


corresponding class 20—30 contains Q;. 


2 1 
. Q,-204 л | 125~65)=20+ 19х60 —20--7.06 —27.06 


The c f just greater than ЗУ =375 is 380. Thus the corres“ 


ponding class 40—50 contains Q,. 
5 1 
оо xí 375—310 )=404 88 —40--9.29—49.99 


EXERCISE 5.2 


l. Define median and discuss its relative merits and demerits, 


2. The mean is the most common measure of central tendency of the data, 
Tt statisfies almost all the requirements of a good average. The median is also an 
average, but it does not satisfy all the requirements of a good average. However, 
it carries certain merits and hence is usefull in particular fields." Critically 
examine both the averages. 
3. What do you understand by central tendency ? Under what conditions 
^4 is median more suitable than other measures of central tendency ? 
(Bombay U. B. Com. May 1980) 
4. In each of the following cases, explain whether the description applies 
to mean, median or both : 
(i) Can be calculated from a frequency distribution with open end 
classes. 
(ii) The values of all items are taken into consideration in the calcula- 
tion. 
(iil) The values of extreme items do not influence the average. 


(v) In a distribution with a single peak and moderate skewness to the right 
it is closer to the concentration of the distribution. 
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Ans. [(i) median, (ii) mean, (iii) median (iv) median.] 

5. (а) Find the medians of the following two series : 
(i) 38, 34, 39, 35, 32, 31, 37, 30, 41, 
(i; 30, 31, 36, 33, 29, 28, 35, 36 


Ans. (i) 35, (ii) 32 
(Lucknow U. B. Com. 1982) 


6. What are the properties of median ? 
Following are the marks obtained by a batch of 10 students in a certain 
class test in Statistics (X) and Accountancy (Y) 


Roll No. : 1 2 3 4 5 6 2S 9 10 
x 63 64 62 32 30 60 47 46 35 28 


vc : 68 66 35 42 26 85 44 80 33 72 


In which subject is the level of knowledge of the students higher ? 
(Kurukshetra U. B. Сот. II Sept. 1982) 


Ans, Md. (x)—46:5, Md. (y) 255. Level of knowledge of students is 
4 higher in Accountancy. 


7. The following marks have been obtained in three papers of Accoun- 
tancy ій an examination by 12 students. In which paper is the general level of 
knowledge of the students highest ? 

12 


Nudes ООЛО ЕЗ 045 6. 7. 8 9.10 1 

PaperA : 60 56 41 46 54 59 55 51 52 44 37 39 
PaperB : 58 54 21 51 59 46 65 31 68 41 70 36 
PaperC : 65 55 26 40 30 74 45 32 85 32 80 39 


Ans. Md (A) =51.5, Md (B) =52.5, Md (C)= 42.5. Hence the gene- 
tal level of knowledge of students is highest in paper B. 


8. Find mean and median from the data given below : 
Marks obtained : 0—10 10—20 20—30 30—40 40—50 
Мо. of students : 12 18 27 20 17 
(Guru Nanak Dey U. B. Com. II April 1983) 


Ans. Mean-27, Мейап=27:41 


9 Calculate the median for the following statistical distrib 
Value : Lessthan 100 100—200 200—300 300—400 400 and abo 


Erequencys: 050 90 158 68 134 
L.C. W. A. (Intermediate) Dec. 1981] 


50—60 
6 


ution : 
ve Total 
500 


Ans. 269 62. 


А 10. What are the various measure of central tendency ? Chose ап appro- 
priate measure for the following distribution : 
Monthly income (in Rs.) No. of 
in locality X fariilies 
Below 100 50 
100—200 500 
200—300 555 
300—400 100 
400—500 3 
500 and above 2 


Ans. Median=Rs. 209:90 
[Delhi U. B.A. (Econ. Hons.), 1976] 
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1l. The following table gives the distribution of marks secured by some 
students in a certain examination LÍ 


Marks : 0-20 21—30 31—40 41—50 51-60 61—70 71—80 
No, of 
Students : 42 38 120 84 48 36 31 


Find (1) Median marks: 


(2) The percentage of failure if minimum fora pus is 35 marks, 
(Himachal Pradesh U.B. Com. April 1981) 
Ans. (1) Md— 40.46: (2) 31-58%: 
12, Find out the value of median for the following data : 


Weight in Ibs. No. of students Weight in Ibs, No, of students 


60—64 10 85— 94 26 
65—74 39 95— 99 9 
75—79 35 100—104 6 
80— 84 20 


Ans. 77°86 Ibs. 
(Delhi. U. B. Сот. 1979) 


13. Find the average wage of a labourer from the following table : 
Wages (in Rs.) No. of labourers Wages (in Rs.) No. of labourer: 
Above 0 650 Above 40 / 300 ч 


5-10 500 IURE 275 
x 5D 425 EM Sb 250 
» 30 375 E n) 100 


(Kerala U. B. Com., April 1978 ; Nagpur U. B, Com., Oct. 1977) 
Ans. Md=Rs, 36.67 


14. The following table gives the weekly wages inrupeesin a certain 
commercial organisation. 


Weekly wages (Rs.) : 30— 32— 34— 36— 38— 40— 
Frequency E 3 8 24 31 50 61 

42—  44—  46— 48—50 

38 21 12 2 

Find (i) the median and the first quartile (i!) the number of wage carners 

recceiving between Rs, 37 and 47 per week, 
В U.C.W. A. ( Intermediate) June 1982] 
Ans. (i) Md—Rs. 40.30 ; Q,—Rs. 37.77; (Ii) 191 


15. The table below shows a frequency distribution of grades on the 


final examination in college algebra. 


(a) Find the three quartiles of the distribution, and 
(a) Interpret clearly the significance of each. 


Grade Number of Grade Number of 
students students 
90— 100 9 50—59 1 
80—89 32 40—49 3 
70—79 eu 30-39 1 
E 1 
RS Total 120 


(Punjab U. B. Com. Sept. 1980) 
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16. Find: 


(a) the 2nd decile. 
(b) the 4th decile. 
(c) the 90th percentile, and 
._.. (0) the 68th percentile for the data given below : interpreting clearly th 
Significance of each : 


Age of Head of family Number Age of Head of Number 


(Years) (in millions) family (years) (їп millions) 
Under 25 2:22 55—64 6:63 
25—29 405 65—74 416 
30-34 5:08 75 and over 1.66 
35—44 10:45 —— 
45—54 9:47 Total 43:72 
(Punjab U. B. Com. 1980) 
Ans, D,-3194 , р,=40:38, P,,67-98, Pa =52:87 
17. Find the semi inter-quartile range for the following distribution : 
Monthly Income No. of Monthly Income No. of 
(in Rs. ) families (n Rs.) families 
0—75 14 200—250 334 
75—100 52 250—350 443 
100—150 200 350—550 218 
150—200 239 550—800 225 


Find also the median and find its difference from the average of two 
quartiles, 


[I.C.W.A. (Intermediate) Dec. 1980) 
4m.  40Q,—Q)— 88-11; Median=255. 30 ; 
1М4—{(О,+О,) | =17.38 
18. Draw a less than cumulative frequency curve for the following dis- 


tribution ofincomes. Find the fourth decide from the graph. Also find the 
limits for the middle 60% of the distribution. 


Income in 
OOofRs. :2—4 4-6 & 8 8—10 10-12 12-14 14-16 16—18 
No of persons: 18 22 50 112 13 110 43 10 


(Bombay U. B. Com. April 1982) 
Ans, Dya 9.96 ; (8.18 to 13.15) 
19. Draw an ogive for the data given below and show how can the 
value of median be read off from this graph. Verify your result. 


Class Interval : 0—5 5—10 10—15 15—20 20—25 nus 
Frequency : 5 “10 15 8 7 


(Guru Nanak Dev U.B. Com. II Sept, 1983) 
Ans. Median=13:5 ; By formula, Md=13,33 


20. Draw a ‘leas than ogive’ from the following data and hence find out 
the value of lower quartile 
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Class Interval : 0—5 5—10 10—20 20—30 30-40 40—50 
Frequency 5 7 15 20 8 5 


(Delhi U. B. Com. IIT, 1984) 
Ans. Q,—12 


21. The frequency distribution of helghts of 100 college students are 
as follows = 


Height (cms) : 141—150 151—160 161—170 171—180 181—190 Total 
Frequency ' : 5) 16 56 19 4 100 


Draw ап Ogive (less than or More than type) ofthis distribution 


and from the ogive find (i) the first quartile (ii) the median and (ii) the 
third quartile, 


[L.C.W.A, (Intermediate) Dec. 1981] 
Ans : 0,=161.2 cms, Q,—170.1 cms, Median 165.7 cms. 


22. Draw a less than cumulative frequency curve for the following 
data and find from the graph the value of seventh decile, 


Monthly No. of Monthly No. of 
income workers income workers 
0—100 12 500 —600 20 
100 —200 28 600—700 20 
200 —300 35 700—800 17 
300 — 400 65 800—900 13 
400—500 30 900 —1000 10 


(Bombay U.B.Com. April 1983) 
Ans, D,= 545 


One hundred and twenty students appeared for a certain test and 


‚ the following marks distribution was obtained : 


Marks : 0—20 20-40 40-60 60—80 83—100 
Students: 10 30 36 30 14 
Find: (i) The limits of marksof middle 30% students, 

(Ii) The percentage of students getting marks more than 75. 


(iii) The number of students who fail, if 35 marks are required 
for passing. 


Ans. (i) P4411 ; P613 


aw Am (3-79 )«] =17.9% 


15 
(i) 10+ 29 *30=32.5=33 


24. The expenditure of 1,000 families is given as under : 


Expenditure (in Rs.) : 40—59 60—79 80—99 100—119 120 —139 
No. of families Е 50 ? 500 2. 50 


The median for the distribution ig Rs. 87. Calculate the missing freg- 
uencies, 


Ans, 262.5, 137.5 ~263, 137. (сл Intermediate, May, 1977) 
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25. An incomplete frequency distribution is given as follows :— 


Variable Frequency 
10—20 12 
20—30 30 
30-40 ? 
40-50 65 
50—60 7 
60—70 25 
70-80 19 


Total 230 


You are given that median value is 46. 


(a) Using the median formula, fill up the missing frequencies, 


b) Calculate the Arithmetic Mean of the completed table. 
ү, (Guru Nank Dev. U. B. Com. 1981) 


26. The following data represent travel expenses (otber than trans- 
poranon for 7 trips made during November by a salesman for a small 
тт: 


Trip Days Expense Expense per day 
(Rs.) (Rs.) 
1 0:5 13-50 27 
2 20 12:00 6 
3 35 17-50 s 
4 ro 9:00 9 
5 9'0 27:00 3 
6 os 9-00 18 
8:5 17:00 2 
Total 25:0 105:00 70 


An auditor criticised these expenses as excessive, asserting that the 
average expense per day is Rs. 10 d 70 divided by 7). The salesman 
replied that the average is only Rs. 4 20 (Rs. 105 divided by 25) and that 
in any event the median is the appropriate measure and is only Rs. 3. The 
aud&or rejoined that the arithmetic meanis the appropriate measure, but 
that the median is Rs, 6, 


You are required to : 
(i) Explain the proper interpretation of each ofthe four averages 
mentioned. 
(ii) Which average seems appropriate to you ? 
(4.1.М.А. Diploma in Management, January 1980) 


ч 21 For a certain group of saree weavers of Varanasi, the median and 
шг earnings рег week are Rs. 44.3, Rs. 43-0 and Rs. 45:9 respectively. 
е earnings for the group range between Вз. 40 and Rs, 50. Ten per cent of 
the group earn under Rs. 42, 13% earn Rs. 47 and over and 6% Rs. 48 and 
over. Put these data in the form ofa frequency distribution and obtain the 
value of the mean wage. 
Ans. Wages : 
40-42  42—  43—  443—  459— 47— 48-50 
No. of workers ; 
10 15 25 25 12 7 6 
х= К. 44:5 
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5.7. Mode. Mode is the value which occurs most frequ- 
ently in a set of observations and around which the other items of 
the set cluster densely. In other words, mode isthe value ofa 
series which is predominant in it In the words of Croxton 
and Cowden, “The mode of a distribution ig value at the point 
around which the items tend to be moat heavily concentrated. It may 
be regarded as the most typical of a series of values.” 


In the following statements : 
(i) Average size of the shoe sold in a shop is 7, 


(ii) Average height of an Indian (male) is 5 feet 6 inches 
(1°68 metres approx.), 

(iii) Average collar size of the shirt sold in a ready-made 
garment shop is 35 cms, 

(iv) Average student in a professional college spends Rs. 250 
per month ; 
the average referred to is neither mean nor median but 
mode, the most frequent value in the distribution, For example, 
by the first statement we mean that there із maximum demand for 
the shoe of size No. 7. 


5.7.1. Computation of Mode, In case of a frequency 
distribution, mode is the value ofthe variable corresponding to the 
maximum frequency. This method can be applied with ease and 
simplicity ifthe distribution is ‘unimodal’, ie, if it has only one 
mode. In other words, this method can be used with convenience if 
there is only one value with highest concentration of observations, 
For example, in the following distribution : 


A 1 2 3 4 5 6 75-8 9 

ДЕ 3 1 18.25. 40. 30> 92) 110 6 
the maximum frequency is 40 and therefore, the corresponding 
value of X viz., 5 gives the value of mode. In case of a frequency 


Curve (see diagram below) mode corresponds to the peak of the 
curve, 
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Frequency 


Mode x 
Fig. 5.2 
In the case of continuous frequency distribution, the class 


corresponding to the maximum frequency is called the modal elass 
and the value of mode is obtained by the interpolation formula : 


МА 5) 
аА) 
MG Јо) 
Seed a Diy ay ey » (5.18 
25-5 d 
where Г is the lower limit of the modal class, 
fi is the frequency of the modal class, 
fo is the frequency of the class preceding the modal class, 
fa in the frequency of the class succeeding the modal class 
and h is the magnitude of the modal class. 
The symbols fy, f, and f, can be explained easily as follows : 
fo: Frequency of preceeding class. 
Л: Maximum frequency (Frequency of Modal class) 
f» : Frequency of succeeding class. 


Мойе=1+ 


Remarks 1, It maybe pointed out that the above for 
mula for computing mode is based on the following assumptions : 


(i) The frequency distribution must be continuous with ex- 
` clusive type classes without any gaps, Ifthe data are not given in 
the form of continuous classes, it must first Бе converted into conti- 
nuous classes before applying formula (5.18), 


(i) The class intervals must be uniform throughout Фе., the 
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width of all the class intervals must be the same. In case of the 
distribution with unequal class intervals, they should be made equal 
under the assumption that the frequencies are uniformly distributed 
over all the classes, otherwise the value of mode computed from 
(5°18) will give misleading results. 


2. However, the above technique of locating mode is not 
practicable in the following situations : 


(i) If the maximum frequency is repeated or approximately 
equal concentration is found in two or more neighbouring values. 


(ii) If the maximum frequency occurs either in the very begin- 
ing or at the end of the distribution. 


(iii) If there are irregularities in the distribution te., the fre- 
quencies of the variable increase or decrease in a haphazard way. 


In the above situations mode (or modal class іп the case of 
continuous frequency distribution) is located by the method of 
grouping as discussed in Examples 5.37 and 5.38. 


3. Ifthe method of grouping gives the modal class which 
does not correspond to the maximum frequency f, $.e., the frequency 
of modal class is not the maximum frequency, then in some situa- 
tions we may get 2f, —/—/=0. [This will not be possible if f, is 
maximum and f, and f, аге less than fı]. In such a situation viz., 
2f,-f,—f.—0, the value of mode cannot be computed by the 
formula 

hl fifo) 
Mlt- ar 
Р as eat 


as it gives M,—1-4 00—co. Ee 2f;—fo—f.—0] 


In such cases, the value of mode can be obtained by the 
formula : 


Alc am | 
ЖГ SE foh M 


where | А | represents the absolute (positive) value of A. 


Formula (*) is only an approximate formula and does not give 
very correct result because further grouping of classes, say 4 at a 
time may give different value of the modal class and as such a - 


different result. 
As an illustration, for the following data : 
X: 10-20 20—30 30—40 40—50 50—60 60—70 70 —80 
5 10 20-5792 24 


pA 4 6 
X: 80—90 90—100 100—110 
б 6 2 1 
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the usual method of grouping (up to 3 classes at a time) will give 
60—70 as the modal class such that : fi=22, f,—20, f,—24 and 
therefore, 2f,— f,— f, —44-—90—24—0. Hence, 'usual formula for 
mode can't be applied. Using (*), an approximate value of mode 
may be obtained as : 


Cfi — fo) 
I fi-fo | + |fi-fe | 
10(22—20) 
| 22—20 | + 1922—94 | 
10x2 
2! 
577.2. Merits and Demerits of Mode 
Merits. (i) Mode is easy to calculate and understand. In 


some cases it can be located merely by inspection, It can also be 
estimated graphically from a histogram (c f. 8.5.7 3). 


(ii) Mode is not at all affected by extreme observations and as 
such is preferred to arithmetic mean while dealing with exreme 
observations, 


(iii) It can be conveniently obtained in the case of open end 
classes which do not pose any problems here. 


Demerits. (i) Mode is not rigidly defined, It is ill-defined if 
the maximum frequency is repeated or if the maximum frequency 
occurs either, in the very beginning or at the end of the distribution; 
or if the distribution is irregular. In these cases, its value is located 
by the method of grouping (c. f. Examples 5.37, 5.38). If the grouping 
method also gives two values of mode, then the distribution is called 
bi-modal distribution (c. f. Example 5.39). We may also come across 
distributions with morethan two modes, in which case it is called 
multimodal distribution. In case of bimodal or multimodal distribu- 
tions, mode is nota representative measure of location and its 
estimate is obtained by the empirical relation : 


Mode=3 Median—2 Mean, discussed in § 5.8, 


+ (i) Since mode is the value of Y corresponding to the maxi- 
mum frequency, it is not based on all the observations of the series. 
Even inthe case of the continuous frequency distribution [c. Jd 
Formula (5.18)), mode depends on the frequencies of modal class 
and the classes preceding or succeeding it. 


(tii) Mode is not suitable for further mathematical treatment, 
For example, from the modal values and the sizes of two or more 
series, we cannot find the mode of the combined series, 


М,=1+ 
—604- 


=60+ =60+5=65 


(iv) As compared with mean, mode is affected to a greater 
extent by the fluctuations of sampling. 


Uses. Being the point of maximum density, mode is specially 
useful in finding the most popular size in studies relating to marke- 


Averages 269 


ting, trade, business and industry. It is the appropriate average to 

be used to find the ideal size e.g., in business forecasting, іп the · 
manufacture of shoes or ready-made garments, in sales, in produc- 

tion etc. 

5.7.3. Graphic Location of Mode. Mode can be located 
graphically from the histogram of frequency distribution by making ' 
use of the rectangles erected on the modal, pre-modal and post- ' 
modal classes. The method consists of the following steps : 


(i) Jointhe top right corner of the rectangle erected on the 
modal class with the top right corner of the rectangle erected on the 
preceding class by means of a straight line. 

(ii) Join thetop left corner of the rectangle erected on the 
modal class with the top left corner of the rectangle erected on the 
succeeding class by a straight line. 

($i) From the point of intersection of the lines in steps (i) and 
(ši) above, draw a perpendicular to the Y-axis (the horizontal scale). 
The abscissa (X-coordinate) of the point where this perpendicular 
meets the X-axis gives the modal value, 

5.8. Empirical Relation Between Mean (M), Median (Md) 
and Mode (Mo). In case of a symmetrical distribution mean, median 
and mode coincide i.e. Mean=Median=Mode (c.f, Chapter 7 
on Skewness). However, for a moderately asymmetrical (non-symme- 
trical or skewed) distribution, mean and mode usually lie on the 
two ends and median lies in between them and they obey the follo» 
wing important empirical relationship, given by Prof. Karl Pearson. 


Mode- Mean— 3 (Mean —Median) (5.19) 
> Mean—Mode=3 (Mean—Median) 
> Mean—Median=3 (Mean—Mode) ...(5:19а) 


Thus we see that the difference between mean and mode is; 
three times the difference between mean and median. In other 
words, median is closer to mean than mode. The above relation bet- 
ween mean (M), median (Md) and mode (Mo) can be exhibited dia- 
grammatically as follows : 


RELATIONSHIP BETWEEN ARITHMETIC 
MEAN, MEDIAN AND MODE 


Divides orea 
in halves 

Under peok 
of curve 
Centre of 
gravity 
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Remarks, |. Equation (5:19) may be rewritten to give : 


Mode=Mean—3 Mean+3 Median 
> Mode=3Median—2Mean ...(5.20) 
This formula is Specially useful to determine the value of mode 


in case it is ill defined, e.g, in the case of bimodal or multimodal 
distributions [c. f. Exarnple 5.40]. 


2. If we know any two of the three values M, Md and Mo 
the third can be estimated by using (5'20). The value so computed 
will be more or less same as obtained by using the exact formula 
Provided the distribution is moderately asymmetrical. 


3. For a positively skewed distribution [c.f. Chapter 7], mean 
will be greater than median and median will be greater than mode 
ie, 

M> Md? Мо - Mo<Md<M 
However, in a negatively skewed distribution the order of the 
magnitudes of the three averages will be reversed i.e, for negatively 
skewed distribution we have 
Mo>Md>M > M<Md<Mo 


Example 5.28, Calculate the median and mode Jor the distri- 
bution of the weights of 150 students from the data given below : 


Weight in kg. : 30—40 40—50 50—60 60—70 70-80 80—90 
Frequency Eee 37 45 27 15 8 


[Delhi Uni. B. A, Eco. (Hons.) 1975] 
Solution, 


COMPUTATION OF MEDIAN AND MODE 


Weight in kg. Freguency ( f ) (Less than) c. f. 


30—40 


150 


Median, Here У —79 575. cf. just greater than N/2=75, 


з 100. Hence the Corresponding class viz, 50—60 is the median 
las. Using the median formula we get: 
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10 
Ma=50+75( 75—55 )=s042 x20 


=50+-4.44=54.44 kg. 


Mode. Since the distribution is regular, the class correspon- 
ding to the maximum frequency viz, 45 is the modal class. Hence 
modal class is 50—60 and using the mode formula we get : 


h(fi—fo) 10 (45—37) 
о О 3x45—37—927 


10x 
= 50498 504-10 —50+3.077=53.077 kg. 


My 


Example 5.29. Find the value of mean and mode from the 
data given below : 


Weight (in kg) No. of Students Weight (in kg) No. of Students 


93—97 2 113—117 14 
98—102 5 118—122 6 
103—107 12 123—127 3 
108—112 17 128—132 1 


[Delhi Uni, B. А. (Econ. Hons.) 1982] 


Solution, Since the formula for mode requires the distri- 
bution to be continuous with ‘exclusive type’ classes we first convert 
the classes into class boundaries as given in the following table : 


COMPUTATION OF MEAN AND MODE 


Weight Class Mid-value No. of 
(in kg) boundaries (0) 


9397 92:5—97:5 95 
98—102  97'5—102:5 100 
103—107 102°5—107°5 105 


108—112 107:5—112:5 110 
113—117 112°5-117°5 115 
118—122 117.5—122:5 120 
123—127 1225—1275 125 
128—132 1275—1820 


hXfd 5x8 


Mean 4-4 —~—=110+ Fite 10°66 Kgs. 
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Here maximum frequency is 17, The corresponding clasy 


107.5—112.5 is the modal class. Using the mode formula, we 
get: 


моае=1-+—*(й—%)__ 


2Л\—/›—% 
is 5x(17—12) 
ET 2х17—12—14 


= 107.542 —107.54-3.125 
110.625 Kgs, 


Example 5.30. Determine the mode of the following dala. 


Solution. In this case, the distribution is irregular, since the 
frequencies increase up to 9 and then decrease up to 2 and again go 
upto 10 and then start decreasing. As the distribution is not regular 
we cannot say that the value of mode is 24; as the maximum 
frequency із 10, Here we try tolocate mode by the method of 
Brouping as explained below. 


The frequencies in column (1) are the original frequen- 
Cies. Column (2) is obtained by combining the frequencies two by 
two in column (1). Column (3) is obtained on combining the fre- 
quencies two by two in column (1) after leaving the first frequency. 
lf we leave the first two frequencies and Combine the frequencies 
two by two in column (1) we shall get a repetition of values obtained 
in column (2). Hence we суды to combine the frequencies in 
column (1) three by three to get column (4). The combination of 
frequencies three by three after leaving the first frequency and first 
two frequencies in column (1) results in column (5) and (6) respec- 
tively. If we combine the frequencies three by three after leaving 
the first three frequencies in column (1), we geta repetition of 
values obtained in column (4). The maximum frequency in each 
column is represented by ‘Italic’ type. For computing the value of 
mode, we prepare the following table. 


E 
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COMPUTATION OF MODE; GROUPING TABLE 


Size of jo Frequencies 


(4 


ANALYSIS TABLE 


Maximum Value or combination of values of the variable X 
Frequency giving maximum frequency in column (II) 


Number of times the 
] value of the variable X 
occurs 


Since the value 25 of the variable occurs maximum number 


of times (5 times), the mode is 25. 
Example 5.31. Construct a frequency distribution showing the 
frequencies with which words of different number of letters occur in the 
extract reproduced below (omitting punctuation marks. treating as the 
variable the number of lettere in each word) and obtain the median and 
the mode of the distribution. 
A candidate at the time of applying for registration as a student 
of the institute should be not less than eighteen years of age and have d 


a 


Passed the intermediate examination of a university constituted by law 
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in India or an examination recognized by Central Government as 
equivalent thereto, or the National Diploma in Commerce Examination 
or the Diploma in Rural Service Examination conducted by the 
National Council of Rural Higher Education. 


Solution, Here the variable Y represents the number of letters 
in each word. For example, in the word ‘candidate’ there are 9 letters 
€, a, n, d, i, d, a, t and e. Hence Y, corresponding to the word ‘candi- 
date’ is 9. Thus replacing each word by the number of letters in it, 
the distribution of the number of letters in each word in the given 
paragraph is as follows : 


І, 9, 2, 3, 4, 


2, 8,.3,12, 2, 1, 7, 2, 3; 9, 
БВА 4 B 


о 43..04:35. 03, 19. 11; 


л co 


2, 1,10, 11, 2, 3, 2, 5, 2, 2, 11, 10, 2, 3, 7, 
105591057; 50188; ПЕВАО 8, 7, 9 
ОЕ РЕС АУУ 5, 6, 9. 


The above data can be arranged in the form of a frequency 
distribution as follows 


COMPUTATION OF MEDIAN 


Frequency 
Tally Marks (Р 
1 3 
2 19 
3 12 
4 4 
5 4 
6 3 
1 7 
8 © 
9 3 
10 4 
11 5 
12 2 
: N 72 А д 
Median. Неге = = 72 736. Since c f. just greater than 


36 is 38, the corresponding value of X is median, which is 4. 


Mode: Since the above frequency distribution is not regular, 
the value of mode is located by the method of grouping. 


~ 
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COMPUTATION OF MODE ; GROUPING TABLE 


Frequencies 


© 0 м A 0 A ш ы н 


= 
© 


Rb Uk ш O -3 0 à A 


: For computing the value of mode we prepare the following 
table : 


ANALYSIS TABLE 


Maximum Value or combination of values of X corresponding 
frequency 10 maximum frequency in (II) 


2 


0 
Frequency of the 
variable (X) 


Since the value 2 is repeated maximum number (5) of times, 
the mode is 2. 
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Example 5.32. Calculate mode from the following data : 


Marks No. of Students Marks No, of Studenis 
Below 10 4 Below 60 86 

» 20 6 » 70 96 

» 30 24 „ 80 99 

» 40 46 2 90 100 

» 90 67 


(Delhi Uni. B. Com. 1977) 


Solution. Since we are given the cumulative frequency distri- 
bution of marks, first we shall convert it into the frequency 
distribution as given below : 


, Further, since the frequencies first decrease, then increase and 
again decrease, the distribution is irregular and hence the modal 


class is located by the method of grouping as explained in the table 
given below : 


GROUPING TABLE 
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For computing modal class, we prepare another table as given 
ow : 


ANALYSIS TABLE 


In the above table there are two classes viz , 30—40 and 40— 
50 which are repeated maximum number (5) of times and as such 
we cannot decide about the modal class. Thus, even the method of 
grouping fails to give the modal class. 


We say that in the above example mode is ill-defined and we 
locate it by the empirical formula : 
Mo=3Md-2M salt) 


COMPUTATION OF ARITHMETIC MEAN AND MEDIAN 


= E8) 
Mean=A+ N 45+ 100 


hx fd 


—45—2.8—422 


278 Business Statistics 


Here = 1 so. Since c. f. just Breater than 50 
is 67, the corresponding class 40—50 is the median class, 
Hence, 


10/ 100 iR 10x4 
Ma=40+ >( 46 ) =40+ TH. 
—404-1 9—41.9 


Substituting in (*) we get : 
Mode=3 X41.9—2x492.2—125.7—84.4—41.3 


Example 5.33. Below is given the freugency distribution of 
weights of a group òf 60 students of a class in a school : 


Weight in kg. Number of students Weight in kg Number of students 


20— 34 3 50 — 54 14 
35—39 5 55—59 6 
40—44 12 60—64 2 
45—49 18 


; (a) Draw histogram for this distribution and Jind the modal 
value, 


(b) (1) Prepare the cumulative frequency (both less than and 
more than types) distribution, and (2) represent. them graphically on 
the same graph paper. Hence find the median. 


(c) With the modal and the median values as obtained in (a) 
and (b), use an appropriate empirical formula to find the arithmetic 
mean of this distribution. 


Solution. To draw the histogram and cumulative frequency 
Curves (both less than and more than types) we first convert the 
distribution into continuous class intervals аз given in the following 
table, 


Weight in kgs Number of -Less than — More than 
students ( f ) EF c.f. 
3 
8 
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(a) Histogram, Histogram is obtained on erecting rectangles 
on the class intervals with heights proportional to the corresponding 
class frequencies. 


Y 
HISTOGRAM 


56 12 
о 
2 10 
8 
ш 8 
Fe 
[s 
6 
4 
0 
29.5 345 39.5 445 495 545 59.5 64.5 x 
CLASS INTERVALS 
Fig. 5.5. 
Mode—0P-— 48, 


(b) The ‘less than’ and ‘more than’ (ogives) are drawn in the 
following diagram. 


EY 


OGIVES 


60 


50 LESS THAN OGIVE 


Оз = 52.0 (app) 


40 


30 


20 


CUMULATIVE FREQUENCY 


MORE THAN 
OGIVE 


o 345 395 445 495 545 59.5 645 
CLASS INTERVALS 


Fig, 5.6. 
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From the point of intersection of the two curves (ogives) draw 


perpendicular on X-axis meeting X-axis at Q Then 0Q—47.8 kgs. 
gives the median weight. 


(c) The empirical relation between mean, median and mode 
is given by : 


d—Mo _ 3х47.8—48 
Mo=3Md—2M > y= 9Md—Mo _ 


2 2 
eae dE as =47:700 kgs. 
EXERCISE 5:3 


1. What do you understand by mode ? Discuss its relative merits and 
demerits as a measure of central tendency. Also give two practical situations 
Where you will recommend the use of mode. 


2. What are the desiderata of a good average ? Compare the mean, the 
median and the mode in the light of these desiderata. Why are averages called 
measures of central tendency ? 


[Calicut Uni. B.A, (Econ.) April 1978: Mysore Uni. B. Com., Oct. 1978] 


3. How would you account for the predominant choice of arithmetic 
mean of statistical data as a measure of central tendency 7 Under what circum- 
stances would it be appropriate to use mode or median ? 


[Osmania U. B. Com. (Hons.) Nov, 1981] 


4. Pointout the merits and demerits of the mean, the median and the 
mode as measures of central tendency of numerical data. 


5. Compare, giving illustrations, the arithmetic mean, the median and 
the mode in regard to : 


(a) the effect of extreme items in computation, 
(b) case in computation, 

(c) stability in sampling situations, 

(d) existence of the average as an actual case, and 
(е) popular use. 


$. The following figures represent the number of books issued at the 
counter of a commerce college library on 12 different days : 


96, 180, 98, 75, 270, 80, 102, 100, 94, 75, 200, 610. 


Calculate the arithmetric mean, median and mode for this data. Which 
of these would represent the above data best ? 


Ans. Mean=165, Median—99, Mode=75, 
7. The Bharat Ball Bearings Ltd., has collected the following data :— 


Values of X 
12 19 21 30 
13 19 22 31 
17 20 24 31 


18 21 27 31 


1 
t 
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* (i) Compute the arithmetic mean, the median and the mode using the 
sixteen observations given. 


(ii) Why is the mode said to be an erratic measure of central tendency ? 
(#11) Why is the median called a position average ? 

(Punjab U. B. Com. 1979) 
Ans, — A.M.—2225, Md.—21, Мо.=31. 


8. Calculate mean, median and mode from the following data of the 
heights in inches of a group of students : 
61, 62, 63, 61, 63, 64, 64, 60, 65, 63, 64, 65, 66, 64. 


Now suppose that a group of students whose heights are 60, 66, 59, 68, 
67, and 70 inches, is added to the original group. Find mean, median and mode 
of the combind group. 
(Punjab U. B.Com. 1981) 
Ans. First group: M=63.2, Md=63.5, Mo=64; 
Combined group: M=63.75, Md=64, Мо=64, 


9. Atul gets a pocket money allowance of Rs. 12 per month, Thinking 
that this was rather less, he asked his friends about their allowances 
and obtained the following data which includes his allowance  also— 
(amounts in Rs.) 

12, 18, 10, 5, 25, 20, 20, 22, 15, 10, 10, 15, 13, 20, 18, 10, 15, 10, 18, 

15, 12, 15, 10, 15, 10, 12, 18, 20, 5, 8. 


He presented these data to his father and asked for an increase in his 
allowance as he was getting less than average amount. His father, a statistician, 
countered pointing out that Atul’s allowance was actually more than the 
average amount, 


Reconcile these statements. 
(Bombay Univ, В. Com., April 1975) 


Ans. Atul computed A, M. and his father computed Mode. 


10. Find the mean and mode from the following data : 


Marks 10—25 25-40 40—55 55—70 70—85 85—100 
No, of students 6 50 44 26 3 1 
(Guru Nanak Dev U. B. Сот. II Sept. 1983) 


Ans. Mean-4438, Mode--38:20 


„11. Calclate (а) the median (6) the mode and (c) two quartiles from ‘the 
following data 


Age No. of persons Age No. of persons 
20—25 100 40—45 300 
25—30 140 45—50 240 
30—35 200 50—55 140 
35—40 360 55—60 120 


(Nagarjuna U. B. Com, April 1981) 
Ans. Md=40, Мо=38:64, Q, =34, О, =47:08 
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12. An employer decides to offer а cash gift of 5% of the average 
monthly wage in his factory to every employee. Calculate it taking average to 
be (i) the mode (її) median. 


Monthly wage No. of Monthly wage No. of 
(Rs.) Employees (Rs.) Employees 
20 —30 28 60—70 56 
30—40 32 70— 80 40 
40—50 45 80—90 20 
50—60 60 


(Kurukshetra U. B. Com, Sept. 1978) 
Ans. Mo=Rs, 57°89, Cash gift рег employee=Rs, 2:90 3 
Md=Rs, 55:92, Cash gift per employee Rs, 2:80. 


13 Given below is the frequency distribution of marks obtained by 90 
students, Compute the arithmetic mean ; median and mode. 


Marks No. of students Marks No. of students 
15—19 6 45—49 9 
20-24 14 50—54 10 
25-29 12 55—59 5 
30-34 10 60—64 4 
35—39 10 65—69 1 
40-44 9 


[-C.W.A. (Intermediate) Dec. 1980] 
Ans. Mean —37:17, Md=36, Mo=23'5, 


14. Find out the Mean, Median and the Mode in the following series — 

Size (below) 5 10 15 20 25 30 35 

Frequency 1 3 13 17 27 36 38 
[Kurukshetra U, В. Сот, (Sept.) 1980] 

Ans, Меап= 19:74, Ма=21, Мо=243. 

15. Calculate the modal value from th 


Income (Rs.) No. of persons 
Less than 100 8 


€ following data : 
Income (Rs.) No. of persons 


Less than 400 
» „ 200 22 »  » 500 67 
»  » 300 35 »  . 600 


70 
(Punjab U. B, Com. Sept. 1981) 
Ans. Mode-Rs, 340 


16. Calculate the median and mode of the following : 


Annual Sales Frequency Annual Sales Frequency 
(Rs. '000) (Rs. '000) 
Less than 10 4 Less than 40 55 
„ „ 20 20 59 211750. 62 
» » 30 35 7^ nuo) 67 


Is it possible to calculate the arithmetic mean ? If possible, calculate it. 
[I.C.W.A. (Intermediate) June 1983] 
Ans, Md=29, Mo —32:78 ; Yes, it is possible to find mean, Mean —28:73 


distribution given below. Hence calculate the mode using the empirical relation 
tween the three. 


Class limits Frequency Class limits Frequency 
130—134 5 150— 154 17 
135—139 15 155—159 10 
140—144 28 160—164 1 
145—149 24 

100 


і U.C.W. 4. (Intermediate) Dec, 1 984] 
Ans, M=145-35 Md=144-92, Mo=144-06, 
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. , 18. Compute mode from the following data relating to dividend paid 
by joint stock companies in the year 1976: 


Dividend (in % of No. of Dividend (in % of No. of 
the share value) companies the share value) companies 
0— ro 10 10:0—12:5 104 
ro- 2:5 15 125-140 250 
2:55— 3:5 20 140—150 164 
3:5— 40 23 150—175 278 
4'0— 5:0 30 17:5—19:0 100 
50— T5 172 19:0—20:0 72 
T5— 90 60 20:0—22:5 331 
9:0 —10.0 39 225-250 150 


[Delhi Uni. B. Com. 1978] 
Hint: Rewrite frequency distribution with classes of equal magni 
2:5 viz., 0—2:5, 2 S—5-0, 5:0—7:5,..., 22510 25:0 io 
Ans. Mode—14:23 (% of the share value). 


19. Calculate the Mode, Median and Arithmetic average from the 
following data. 


Class f Class f 
0—2 8 25—30 45 
2—4 12 30 —40 60 
4—10 20 40—50 20 
10—15 10 50—60 13 
15—20 16 60— 80 15 
20—25 25 80—100 4 


(Punjab U. B. Com. II, Sep. 1982) 


Hint: Rewrite the frequency distribution with classes of equal magni- ı 
tude 10. 
Ans. Mode-28:15. Md—28:29, Mean=30°08 
20, In the following data two class frequencies are missing. 


Class Frequency Class Frequency 
100—110 4 150—160 ? 
110—120 7 160— 170 16 
120—130 15 170—180 10 
130—140 ? 180—190 6 
140—150 40 190 — 200 3 


However it was possible to ascertain that the total number of frequen- 
cies was 150 and that the median has been correctly found to be 146725. ‘ 


You are required to find out with the help of the information given : 

(i) Two missing frequencies. 

(ii) Having found the missing frequencies, calculate arithmetic mean. 
(iii) Without using the direct formula, find the value of the mode, 

Ans. (i) 24,25; (ii) Х=147°33 (iii) Mode= 144-08 

21. The median and mode of the following wage distribution are known 


10 be Rs. 33:5 and Rs. 34 respectively. Three frequency values from the table are 
however, missing. You are required to find out those values. 
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Wages in Rs, No. of persons Wages of Rs. No. of persons 
0—10 4 40—50 ` 2 
10 ~20 16 50 -60 6 
20-30 i Z 60—70 4 
30—40 ? 
Total 230 


(Guru Nanak Dev, Unt. В. Com. 1980) 
Ans. 60,100,40, 


22 “Hari put the jar of water and the packet of sweets on the ground 
and sat down in the shade Of the tree and waited.” 


. Prepare a frequency distribution for the words in the above sentence 
taking the number of letters in words as the variable. Calculate the mean, me- 
dian and mode, 


[Bombay Uni., B. Bom. 1973] 
dns, Mean=3-56, Median=Mode=3., 


23. Treating the number of letters in each word in the following passage 
as the variable X, Prepare the frequency distribution table and obtain its mean, 
median, mode. 

“The reliability of data must always be examined before any attempt is 
made to base conclusions upon them, This is true of all data, but particularly 
80 of numercia} data, which do not carry their quality written large on them. It 


is a waste of time to apply the refined theoretical methods of Statistics to data 
which are Suspect from the beginning,” 

Ans. Mean=4-565, Median=4, Mode=3, 

24, What is the relationship between mean, median and mode ? 


Find out the mode of the following data graphically and check the result 
through calculation, 


Size 0—1 1-2 2—3 3-4 4-5 5-6 6—7 7-8 8—9 9—10 10-11 
Frequency 3 Ул. n9 15 25 20 14 р 8 6 2 


(Delhi Uni. B. Com, 1981) 
Ans, Mode=4'67, 


25. Calculate mode from the following data : 


Monthly wage No, of Monthly wage No of 
(in Rs.) workers (in Вз.) workers 
200 —250 4 400 — 450 33 
250-300 6 450— 500 17 
300—350 20 500 —550 8 
350—400 12 550— 600 2 


Calculate mode by graphic method also. 


[Delhi Uni. B. Com. (External) 1982] 
Hint, 


Since the distribution is irregular, find mode by “method of grouping.” 
Ans, Mo=428'38 


26. The monthly profits in rupees of 100 shops are distributed аз 
follows : 


Profits per shop Мо, of shops Profits per shop No. of shops 
0—100 12 300— 400 20 
100 —200 18 400—500 17 


200—300 27 500 – 600 6 i 
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, Determine the modal value of the above distribution graphically and - 
verify the result by calculation. 


[Delhi Uni. B. Com. (Hons.) 1981) 
Ans, Rs. 256725. 


27. In a moderately skewed distribution : 


. (а) Arithmetic шеап-=24:6 and the mode—26:1. Find the value of the 
median and explain the reason for the method employed. 


(b) In a moderately asymmetrical distribution the value of median is 
42'8 and the value of mode is 40. Find the mean. 


Ans. (a) 251. (b) 442. 

28. (a) Find out the missing figures : 

(a) Mean=? (3 Mediad — Mode). 

(b) Mean—Mode=? (Mean —Median), 

(c) Median —Mode--? (Mean —Mode). 

(d) Mode- Mean —? (Mean— Median). 

(Calicut Uni. B. Com. October 1972) 

Ans. (а) 1/2, (b) 3, (c 2/3, (d) 3. 

(b) Which average would you use in the following situations : 

(i) Sale of shirts 16", 154”, 15", 15", 14", 13”, 15”. 

(i) Marks obtained 10, 8, 12, 4, 7, 11 and X. (X<5). Justify your answer. 

Ans, (i) Mode, (ii) Median (Bombay Uni. B. Com 1976) 

(c) A.M. and Median of 50 items are 100 and 95 respectively, At the time 
of calculations two items 180 and 90 were wrongly taken as 100 and 10. What 
are the correct values of Mean and Median ? 


Ans. Mean—10372 : Median is same viz., 95. 


(d) Can the values of mean, mode and median be same ? If yes, state the 
situation. (Delhi Uni. B. Com, 1979) 
Ans. М= Ма= Mo for symmetrical distribution. 
(e) Find out the missing figure 
Mean? (3 Median—Mode) 
(Delhi Uni, B. Com, 1979) 


Ans. 1/2 
In a moderately asymmetrical distribution, the values of mode and 


(f) 
median are 20 and 24 respectively, Locate the value of mean. 
(Delhi Uni, B. Com. 1979) 


Ans. 25, 
29. ЕШ in the blanks : 
(toT can be calculated from a frequency distribution with open end 
classes. 


(Ii) In the calculation of..., all the observations are taken into conside- 
ration. 


(iii) ...... is not affected by extreme observations. 

(iv) Average ramfall ofa city from Monday to Saturday is 0:3 inch. 
Due to heavy rainfall on Sunday, the average rainfall for the week 
increases to 0*5 inch. The rainfall on Sunday was... 

(Kerala Uni. B. Com. Oct. 1977) 

(v) The sum of squared deviations is minimum when taken from...... 

(vi) The sum of absolute deviations is minimum when taken from...... 

(vii) Median~...... Quartile, 


ps 


ww 


5 
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(viii) Mean is......by extreme observations. 
(ix) Median is the average suited for......classes. - 
(x) For studying pnenomenon like intelligence and honesty......is a 
better average to be used while for phenomenon like size of shoes or 
readymade garments the average to be preferred is... .. 
(Bangalore Uni. B. Com. 1977) 
(xi) Typist A can type a sheet in 5 minutes, typist В in 6 minutes and 
typist C in 8 minutes, The average number of sheets typed per hour 
per typist is...... (Bombay Uni, B. Com. Oct, 1976) 


(xii) The mean of 10 observations is 20 and median is 15. If 5 is added 
to each observation, the new mean is...... and median is...... 
(xiii) A distribution with two modes is called......and with more than two 
modes is called...... 
(xiv) Average suited for a qualitative phenomenon is...... 
(xv) If 25% of the observations lie above 80,40% of the observations are 
less than 50 and 70% are greater than 40, then 
seers 80 5 uuum 50 ;...... =40 
(xvi) Relationship between Md, Q,, Q, and О, is... 
(хуй) Ds, Pss, Md, D, and Py are related by...... 
(xvili) Relationship between Dj, Qs, Poo, Prs and О, is...... 
(xix) The етрегіса! relationship between mean, median and mode for а 
moderately symmetrical distribution is...... 
(xx) If the maximum frequency is repeated then mode is located by the 
method of...... 


__ Ans, (1) Md or Mo (ii) Mean (iil) Md or Mo (iv) '7" (v) Mean (vi) 
Median (vif) Second (viii) Very much affected (ix) Open end (x) Median, Mode 
(xi) 9:83 (xil) 25,20 (xii) Bi-modal, Multi-modal (xiv) Median ((xv) О, =80, 
Рь= D,750, P,,740 (xvi) QS 0:=Ма<О, (хуй) DP, Md D,«Py. (xviii) 
D< QS Pax P 0, (xix) Mo 3Md—2M. (xx) Grouping, 


ie 30. State, giving reasons, the average to be used in the following situa- 
08: 


(i) To determine the average size of the shoe sold in а shop. 
(i) To determine the size of agricultural holdings, 
(iil) To determine the average wages in an industrial concern. 
(i) To find the per capita income in different cities. 
(v) To find the average beauty among a group of students іп a class. 
Ans. (1) Mode (ii), (Iii) and (iv) Median ; (v) Mean. 
. 59 Geometric Mean. The geometric mean (usually abbre- 
viated as G.M.)of a set of n observations is the »*^ root of their 


product. Thus if Xj, X,,..., Xn are the given n observations then 
their С.М; is given by 


G.M.e J/X X aXX, X... X Yne (X, Xa.. Xen)" (5.21) 


If n=2 i.e., if we are dealing with two observations only then 
G.M. can be computed by taking the square root of their product. 
For example G.M. of 4 and 16 is 

V4% 16=+/64=8. 


But if n, the number of observations is greater than 2, then the com- 
putation of the п root is very tedious, In such a case the calcula- 


Averages ' 287 


tions are facilitated by making use of the logarithms. Taking 
logarithm of both sides in (5.21), we get 


log (G.M) log (XXa Xn) 
= (log X,+log X,4-...--log Xn) 


=+ zig X ‚..(5.21а) 
Thus we see that the logarithm of the G. M. of a set observations 
is the arithmetric mean of their logarithms. 
Taking Antilog of both sides in (5.21 а) we finally obtain, 


СМ = Antio [== ice x] 205.916) 


In case of frequency distribution 
fib fe fa fn 
where the total number of observations is N—Zf, 


G.M.=[(X,x X, X ...f, times) X (X,x XX ...f, times) 
X «X (Xn X XnX ...fn times] /N 


—(Xy x XX. XX), (5.22) 
Taking logarithm of both sides in (5.22) we get 2 
= . GM.-Antilog [x * flog X ] „..(5.22а) 


In the case of grouped or continuous frequency distributions, 
the values of X are the mid-values of the corresponding classes. 


Steps for the Computation of G.M. in (5.22 a) 


l. Find log Y, where X is the value of the variable or the mid 
value of the class (in case of grouped or continuous freqency 
distribution). 

2. Compute f X log X ie., multiply the values of log X 
obtained in step 1 by the corresponding frequencies. 

3. Obtain the sum of the products f log X in step 2 to get 
Zf log X. 

4. Divide the sum obtained in step 3 by N, the total 
frequency. Я 

5. Take the Antilog of the value obtained in step 4. The 
resulting figure gives the value of G.M. 
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5.91 Merits and Demerits of Geometric Mean. 
Merits : (i) Geometric mean is rigidly defined, 
(ii) It is based on all the observations. 


(iii) It is suitable for further mathematical treatment, If б, 
and G, are the geometric means of two series of Sizes n, and т, 
respectively, then the geometric mean G of the combined series of 
size n,-I-n, is given by 


log G = "lee Gen log 6, --.(5.93) 
1 2 


Remark. The above result can be easily generalised to the 
case of & series as follows : 


If G,, G...., Ge are the geometric means of the Ё series of sizes 
My %..., Mx respectively then the geometric mean G of the combined 
Series of size MyM... ть is given by : 


log @ т log G, +n, log Q+... tnk log Gs 
EOS aTe Tn log Gs 
T3-n34- ...- ng 

(iv) Unlike arithmetic mean which has a bias for higher 
values, geometric mean bas bias for smaller observations and as 
such is quite useful in phenomenon (such as prices) which has a 
| Wer limit (prices cannot go below zero) but has no such upper 
mit. . 


...(5.23 а) 


1 


(о) As compared with mean, G.M. is affected to a lesser 
extent by extreme observations, 


(vi) It is not affected much by fluctuations of sampling. 


Demerits, (i) Because of its abstract mathematical character, 
geometric mean is not easy to understand and to calculate for a 
non mathematical person. 


(ši) Tf any one of the observations is zero, geometric mean 
becomes zero and if any one of the observations is negative, geos 
metric mean becomes imaginary regardless of the Magnitude of the 

ег items, 


5 Uses. In spite of its merits and limitations, geometric mean 
18 specially useful in averaging ratios, Percentages, and rates of in- 
Crease between two periods, For example, G.M. is the appropriate 
average to be used for Computing the average rate of growth of 
Population or average increase in the rate of profits, sales, produc- 


,. Geometric mean is used in the construction of Index Numbers. 
Irving Fisher's ideal index number is based on geometric mean [See 
apter on index Numbers], 


. _ While dealing with data Pertaining to economic and social 
Sciences we usually come across the Situations where it is desired to 
Bive more weightage to smaller items and smalf weightage to larger 


лауте 
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items. G.M. is the most appropriate average to be used in such 
Cases. 


5:92. Compound Interest Formula. Let us suppose that 
P» is the initial value of the variable (i e., the value of the variable 
inthe beginning) and P» be its value at the end of the period n 
and let r be the rate of growth per unit per period, 


Sinceris the rate of growth per unit per period, growth for 
period | is P» r and thus the value ofthe variate at the end of 
period | is r Po+Po=Po(1+r). For the 2nd period the initial value 
ofthe variable becomes Р» (1--ғ). The growth for the 2nd period 
is P» (1--r)r and consequently the value of the variate at the end 
of 2nd period is 


Po(l+r)+Po(1+r)r=Po(1 +r) [14-7] P(14- 7)? 
Similarly proceeding we shall get the value of the variable at 
the end of period 3 is 
Po(1-+r)?+ Po(l +) = Р (14-7)? [1 +r]=Po(1 Һе) 
and finally, its value at the end of period п will be given 
Рһ=Р,(14-+)", ...(5.24) 
which із the compound interest formula for money. 
Equation (5.24) involves four unknown quantities : 
Pn; The value at the end of period n 
Po: The value in the beginning 
n: The length of the period 
r : The rate per unit per period. 
If we are given Po, r and n we can compute Pn by using (5°24) 


directly. However, (5.24) can be used to obtain any one of the four 
values when the remaining three values are given. 


5.9.3. Average Rate of a Variable which Increases by 
Differeat Rates at Different Periods. Let us suppose that in- 
stead of the values of the variable increasing at a constant rate in. 
each period, the rate per unit per period is different, say, fi, fa.» 
rn for the Ist, 2nd,...and nth period respecti: ely. Then as discussed 
in previous section we shall get : 


Р,= The value at the end of Ist period=P,(1+1,) 
Р,= The value at the end of 2nd period=P,(1+1,) (14-7) 
Рһ= The value at the end of period n 
mPQ4n) (1+)... (1а) st) 
Ifris assumed to be the constant rate of growth per unit per 
period, then we get 
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Psz Р, (1+r)" ...(%%) 


Hence equating the values of Ра іп (*) and (**), the average 
rate of growth over the period n is given by : 


(Lr) 02-03) (1-53). (13H78) 
> l+r=[(14r;) (14)... (1У-9) ә ...(5.25) 
If тү, fas ..., tn denote the percentage growth per unit рег 
period for the n periods respectively then we have 


f; т, fn Im 
1+- i X -35)-( "-)] 
-.. (5.26) 
where r is the average percentage growth rate over n periods. 
100-+-r==[(100-+-r,) (100+-7,)...(100--ғя)р/® 
> y==[(100-+ 45) (100-7,)...(100--74)]1/^—100 | ...(5.26a) 


Thus we see that if rates are given aa percentages then the aver. 
age percentage growth rate can be obtained on subtracting 100 from 
the С.М. of (100--r,), (100 4-7,),...,(100-I- ro). 

Remark, It should be clearly "understood that average per- 
centage growth rate is given by (5:26) and not by the geometric 
mean of 7,, r,...,rn. 


Example 5:34. Find the Geometric Mean of 2, 4, 8, 12, 16 
and 24. (Delhi U. В. Com. 1982) 


Solution. 


log Х| 0.3010 0.6021 0.9031 1.0792 1.2041 1.3802 5.4697 


log (С.М) = 2 log X [Using (5.21a)] 


5.4697 
E ~ 703116 
G.M. — Antilog (0 9116)—8.158 
Aliter G.M.—(2X4x8x 12x 16x 24)1/5—(294912)1/5 


5.4698 
6 


*. log G. M.— 4 log 294912— —0 9116 


> G.M.=Antilog (0°9116)=8:158 
, Example 5.35 Calculate arithmetic mean and geometric mean 
of the following distribution 
z $ 2 8 4 5 6 7 8 
А : Di eT SC: LT ane: VEIT MESS 
(Bombay U. B. Com. Nov 1982) 


i 
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Solution, 
COMPUTATION OF A.M. AND G.M. 


x f fe log x flog x 
———————————— — MÀ 

2 2 4 0:3010 0:6020 

3 4 12 04771 1:9084 

4 6 24 0:6021 3:6126 

5 2 10 0:6990 1:3980 

6 3 18 077782 2:3346 

7 2 14 0:8451 1:6902 

8 I 8 0:9031 0:9031 
u—— MM а ee 

Total 3 f=20 Z /х=90 Z f log x =12'4489 
roc casu e eqs UA M L 
P A ASETE ы 
Arithmetic Mean= 2] 7 20 -45 
z Re: $ E flog z ; (x82) 
Geometric Mean— Antilog ( Уу Antilog 20 
= Antilog (0°6224) 


—4'192 


Example 5 36. Find the geometric mean for the following 
distribution : 


Marka : 0-10 10—20 20—30 30—40 40—50 
No.of students : 5 7 16 25 5. 
Solution. 


13:2256 


84:5243 


x 84:5243 
Geometric mean=Antilog| x Zf log X Елш [555] 
—Antilog [1.40874]—25.64 marks, 
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Example 5.37. Three groups of observations contain 8, 7 and 
5 observations. Thear geometric means are 8.52, 10.12 and 7.75 res. 
Pectively. Find the geometric mean of the 20 observations in the single 


Group formed by pooling the three groups. 
[I.C.W.A. (Intermediate) Dec. 1976] 
Solution, In the usual notations, we are given that : 
,—8, n,—7, A5 
6,—852,  6G,—1012, 6,—7.75 


The geometric mean G of the combined group of size 
VW —n;n,4-0,—84-74-5—90, is given by the formula : 


log G = xl п, log G,--n, log G,--n, log G, ] 
=? log 8.524-7 log 10.124-5 log 715] 
= xl 8X 0-9304+7 x 1.0052+5 x 0.8893 ] 


=a 7.44324 7.0864-44.4465 | 


18 9261 
gs —0.9463 
n" G—Antilog (0:9463) —8.837 
. Example 5.38, Pind the average rate of increase in population 
which in the first decade had increased by 20%, in the next by 30% 


and in the third by 40%, 
[Delhi U. B. Com, (Hons.) 1980] 


Solution. Since we are dealing with rate of increase in 
population, the appropriate average to be computed is the geo- 
metric mean and not the arithmetic mean. 


CALCULATIONS FOR GEOMETRIC MEAN 


Decade Rate of growth Fopulation at the log X 
of Population end of the decade 
X 


20% 120 2.0792 
30% 130 2.1139 
3 40% 140 2.1461 


Z logX=6°3392 


G-M.— Antilog (-— z log x )=Antilog (=) 
—Antilog (2.1131) =129.7 


Hence, the average Percentage rate of increase in the population 
per decade over the entire period is 129.7—100=929.7 
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mple 5.39. Under what condition is geometric mean indeter. 
minate ? 
д If the price of a commodity doubles in a period of 4 years, what 
te the average annual percentage increase ? 
[Delhi Uni. B.A. (Econ. Hone.) 1982 ; 
4.I.M.A. (Diploma in Management) May 1978] 
Solution. Geometric mean is indeterminate if any one of the 
given observations is negative. In this case С.М. becomes imagi- 
nary. In general, G.M. is indeterminate (imaginary) if odd number 
of given observations is negative. 
Also, if any one of the given observations is zero, С.М. be- 
comes zero irrespective of the size of the other observations, 
Tn the usual notations we are given : 
Р»= Ёз. x, (зау); Pa=Rs. 22; n—4. 
If r is the average annual percentage increase in price over 
this period then we have : 
Ps-Po(ld-r) > 95-5 (14+) 
(1+) 2 > (1+)=206 


=> 1+r=Antilog (тї 2)=Antilog (4229) 


= Antilog (0.07525) =1.190 
> 7=1.190—1=0.19 
Hence the average annual percentage increase is 0.19 ie, 


1995. р 
A Example 5'40. An assessee depreciated the machinery of his 
factory by 10% each in the first two years and by 40% in the third 
year and thereby claimed 21% average depreciation relief from taxation 
department, but the I.T.O. objected and allowed only 20%. Show 

which of the two és right. 

[Guru Nanak Dev. Uni, B. Com. Sept. 1976 ; 

Pnnjab Uni, B. Com. Sept. 1976] 


Solution, 
a COMPUTATION OF A.M. AND G.M. 


Rate of Value at the 
depreciation end of i year 


ZlogX—5:6866 
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The average value (arithmetic mean) at the end of three years 
із: 


Hence the average rate of depreciation per annum for the 
entire period of 3 years is 100—80=20%, 


This can be computed otherwise by taking the average of 
10%, 10%, 40% which is: 


10--104-40 _ 60 
3 3 
The geometric mean is given by: 
G.M. Antilog( 3 Z log x)= Antilog( 5555) 
=Antilog (1.8955)— 78.61 


Hence the average (geometric mean) rate of depreciation per 
25 for the entire period of three years із 100—78.61=21.39% 
2221$. 


The assessee had claimed 21% depreciation using G.M. while 
the I.T.O. objected and allowed 20% depreciation using A.M. 


, Since we are dealing with rates, the arithmetic mean does not 
depict the average depreciation correctly, Geometric mean is the 
Correct average to be used. Hence, I.T.O. was wrong in not allow- 
ing 21% depreciation as claimed by the assessee, 

59'4. Weighted Geometric Mean. If the different values 
X, X,,..., Xn of the variable are not of equal importance and are 
assigned different weights, say, IW, W;,..., Wn respectively according 


to their degree of importance then their weighted geometric mean 
G.M. (W) is given by Ө ДЕБЕ 


G.M. (№) = (X,PAx XMix... хх IN ...(5.27) 
where N— W,-- 0,--... 4 уа W, is the sum of weights. Taking 
logarithm of both sides in (5:27) we get 
> GM.(W)—Antilog ЕЗ 5 W log x] 

Sete > log X ] 7 
Antilog! =” Е X ...(5.27а) 


Example 5.41. The weighted geometric mean of the four numbers 
$, 25, 19 and 28 is 22:15. If the weights of the first three numbers are 
"a 7 respectively, find the weight (positive integer) of the fourth 
number, 


=20% 


[I.C.W.A, (Intermediate) Dec. 1980} 
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Solution. Let the weight of the fourth number be w. 


COMPUTATION OF WEIGHTED G.M. 


277093 
6:9895 


8:9516 
1:4472w 


The weighted geometric mean G is given by : 


277 ов x > log(22.15)= 26908 44720 
(15+w) x 1:3454= 18°6504-+ 1:4472w 
15 x 1:34544-1:3454w-18:6504 +1°4472w 
20°1810-+ 1°34540= 1865044- 14472 
14472w —1:3454w—20:1810—18:6504 
0710181w— 175306 
w=15306+0:1018=15 approx. 


log G= 


Ve ERY 


5:10. Harmonic Mean. If X,, Xs Xnisa given set of 
n observations, then their harmonic mean, abbreviated as H.M. or 


simply Н is given Ьу: 
1 1 
H= ~s 
dr n nl у 
alatet DP 29: 
n 


bui -.. (5.28) 


In other words, Harmonic Meam is ihe reciprocal of the arith- 
metic mean of the reciprocals of the given observations, 


In case of frequency distribution we have : 


orl ft ttt] Sun 


H NL, 
N 
=> Н= purs ...(5.28а) 


where N=4y, is the total frequency, X is the value of the variable 
or the mid-value of the class (in case of grouped or continuous fre- 
quency distribution) and f is the corresponding frequency of Х. 
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5.10.1. Merits and Demerits of Harmonic Mean. 
Merits : (i) Harmonic mean is rigidly defined. 
(ii) It is based on all the observations. 
(st) It is suitable for further mathematical treatment. 


If H, and Н, are the harmonic means of two series of sizes 
N, and N, respectively, then the harmonic mean H of the combined 
series of size N,-- N, is given Ьу: 
57x25] 
Н NONLH,' Н, 
(iv) Since the reciprocals of the values of the variable are in. 


volvec ,it gives greater weightage to smaller observations and as 
such is not very much affected by one or two big observations, 


...(5.285) 


(v) It is not affected very much by fluctuations of sampling. 


(vi) It is particularly useful in averaging special types of rates 
and ratios where time factor is variable and the act being pers 
formed remains constant, 


Demerits. (i) It is not easy to understand and calculate. 


E (i3) Its value cannot be obtained if any one of the observations 
is zero, 


(iii) It is not a representative figure ofthe distribution unless 
the phenomenon requires greater weightage to be given to smaller 
items. As such, it is hardly used in business problems, 


Uses. As has been pointed out in merit (vi) above, harmonic 
mean is specially useful in averaging rates and ratios where time 
factor is variable and the act being performed e.g., distance is con- 
stant, The following examples will clarify the point. 


Example 5.42, The following table gives the weights of 31 
persons $n a sample enquiry. Calculate the mean weight using ` 
(i) Geometric mean and (i$) Harmonic mean. 

Weight (bs) : 130 135 140 145 146 148 149 150 157 


No. of persons: 3 4 6 6 3 5 2 1] 1 
(Mysore Uni. B. Com. April 1981) 


Averages i 297 
Solution, 
COMPUTATION OF G.M. AND H.M. 


0.21776 


G.M.=Antilog( FH log x)= Адешов ( 66.7710 ) 


= Antilog (2.1539) = 142.5 


N 31 
HM. = Ууу: 7021776 =142.36 


2 Hence the mean weight of 31 persons using (i) geometric mean. 
is 142.5 1bs and (55) harmonic mean is 142.36 lbs. 


Example 5.43. A cyclist pedals from his house to his college 
at a speed of 10 km. p.h. and back from the college to his house at 15 
km. p.h. Find the average apeed. 


Solution. Let the distance from the house to the college be 


In going from house to college, the distance (s kms) is covered. 
in 2/10 hours, while in coming from college to house, the distance 
is covered in 2/15 hours. Thus a total distance of 22 kms is covered 
< z z 
in (4+ 5) hours. 

Total distance travelled 

Меп луш шз аали Total time taken 
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Remarks. 1, In this case the average speed is given by the 
harmonic mean of 10 and 15 and not by the arithmetic mean. 


2. If equal distances are covered (travelled) per unit of time 
with speeds equal to Vi, Vz,.-., V», say, then the average speed is 
given by the harmonic mean of V, Vz, ..., Vn фе. 


tx) L(+) 
( Pty tty y 
Example 5:44. In a certain factory, a unit of work is complet- 
ed by A in 4 minutes, by B in 5 minutes, by O in 6 minutes, by D in 
10 minutes and by E in 12 minutes. What ie their average rate of wor- 
king? What is the average number of units of work completed per 
minute ? At this rate how many unite will they complete in a six-hour 
day. [Delhi Uni. B. Com. (Hons,) 1976] 


Solution. Average rate of working is the harmonic mean of 
4,5,6,10 and 12 and is given by : 
Ig S ыз р TU ы з. 5. __ 
ШЕ И ЕТ ЫНЫ В 15+-12+10+6+5 
(4*3*v*8 3) ( 60 ) 
25x60; 25: 


ох 25 
4 


48 


Hence the average number of units of work completed per 
minute із 4/25. 


Average speed= 


minutes per unit. 


Therefore, at this rate the number of units that will be comple- 
ted in a six-hour (360 minutes) day are 


4 Д 
360х 35 —57.65558 units, 

Example 5.45, An investor buys Rs. 1,200 worth of shares in a 
company each month, During the first 5 months he bought the shares 
ata price of Rs. 10, Rs. 12, Rs. 16, Ra. 20, and Re. 24 per share. 
After 5 months what is the average price paid for the shares by him ? 

[Delhi Uni. B. Com. (Hons.) 1971] 

Solution. Since the share value is changing after a fixed unit 


time (1 month), the required average price per share is the har- 
monic mean of 10, 12, 15, 20 and 24 and is given by : 


5 АУ 5 

Ty 1, 1, 1, l| [12+10+8+6+5 
(setae tas tartar) ( 120 ) 
j SOLO Rs 14.63 


41 
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5.10.2. Weighted Harmonic Mean. Instead of fixed (con- 
stant) distance being travelled with varying speed (c.f. Remark 2 
Example 5°55) let us now suppose that different distances are travel- 
led with corresponding different speeds, In that case what is going 
to be the average speed ? 

Let us suppose that distances £;, 45, -+-. ап, are travelled with 
speed 01, vs, ..., v» per unit of time. 

bitet... Hen 
аар 
> 5 Un 


PAE E RE 
DRES 
Х(5) (z)2w 
which is the weighted harmonic mean of the speeds, the correspons 
ding weights being their distances covered. 


Example 5.46. You make a trip which entails travelling 900 
kms. by train at an average speed of 60 km. p.h., 8000 kms. by boat at 
an average of 25 km. p.h., 400. kms. by plane at 350 km, p.h., and 
finally, 16 kms by taxi at 25 km. p.h. What is your average speed for 


the entire distance ? 
[1.0.W.A, (Intermediate) June 1982] 


Solution. Since different distances аге covered with varying 
speeds, the required average speed is given by the weighted harmo- 
nic mean of the speeds (in km, p h ) 60,25,350 and 25; the eis 
ding weights being the distances covered (in kms) viz, 900,3000, 
and 15 respectively. 


Then Average speed — 


...(5:30) 


COMPUTATION OF WEIGHTED Н.М. 
UE SoS ИСА Ар ЧЫ ee eS oet on 


X w WIX 
60 900 15 
25 3000 m 
E A 0:60 
Sey cee, eS Pees OR Ru c0 ЕДЕ 
XW-4315 Xy|X)-137:03 


pS) ee AUR ORE RN АС ee ANH M АКЕЛУ ЕУ Сї 


ZW. 905. . ph. 
7. Average Speed= X) = 137.03 31.49 km. p. 
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5.11. Relation Between Arithmetic Mean, Geometric 
Mean and Harmonic Mean. The arithmetic mean (A.M.), the 
geometric mean (G.M.) and the harmonic mean (H.M.) of a series 
of n observations are connected by the relation : 


A.M. 2 G.M. 2 H.M, ...(5.31) 


the sign of equality holding if and only if all the n observations are 
equal. 
Remark. For two numbers we also have 
G*—AXH ...(5.34) 
vhere A, @ and Н represent arithmetic mean, geometric mean and 
armonic mean respectively, 
Example 5.47, H.M., A.M. and G.M. of a set of 5 observa. 
tons are 10.2, 16 and 14 respectively." Comment. 
[I.C.W.A. (Final) December 1980] 
Solution. We are given: n—5, A.M.—16 ; G.M.—14 and 
L.M.—10.2, Since A.M.>G.M.>H.M., the above statement is 
orrect. ` 
Example 5.48. The arithmetic mean of two observations is 
27 5 and their geometric mean is 60. Find (i) their harmonic mean 
1d (it) the two observations, 


[1.0.W.A. (Intermediate) June 1983] 


Solution. (i) Let the two observations һе а and 6, Then we 
'e given : 


Mean 15 — 127.5 - a+b=255 (5) 
G.M.—aXb =60 > ab=3600 (th) 


le have ; 


(a—b)*=(a+6)?—4ab =(255)?—4 x 3600 
=65025—14400= 50625 


400 a—b=+44/50625= 4-225 (69) 
a+b=255 a+b=255 
a—b=295 a—b=—225 
Adding we get Adding we get 
2a=480 2a=30 
= a= 50 240 > a= =15 
s. 6=255-a [From(*)] |.`. 6=255—a [From (*)] 
=255— 240 =255—15 


=15 =240 
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Hence (i) the two observations are 240 and 15, 
(ii) Harmonic mean of two numbers a and b is given by : 


2ab _ 2x3600 _ 480 
H.M;-—m-——————m—c B 
a+b 255 17 lial 

‚5-12. Selection of an average. From the discussion of the 
merits and demerits of the various measures of central tendency in 
the preceding sections, it is obvious that no single average is suitable 
for all Practical problems. Each of the averages has its own merits 
and demerits and consequently its own field of importance and 
utility, For example, arithmetic mean is not to be recommended 
while dealing with frequency distribution with extreme observations 
or open end classes. Median and mode are the averages to be used 
while dealing with open end classes, In case of qualitative data 
which cannot be measured quantitatively (e.g., forfinding average 
intelligence, honesty, beauty etc.) median is the only average to be 
used Mode is particularly used in business and geometric mean 
is to be used while dealing with rates and ratios. Harmonic mean 
isto be used in computing special types of average rates or ratios 
where time factor is variable and the act being performed e.g. dis- 
tance, is constant. 

Hence, the averages cannot be used indiscriminately. For 
Sound statistical analysis, a judicious selection of the average depends 
upon : 

(i) the nature and availability of the data, 

(24) the nature of the variable involved, 

(itt) the purpose of the enquiry, 

(iv) the system of classification adopted and 

(v) the use of the average for further statistical computations 
required for the enquiry in mind, 

However, since arithmetic mean : 

(5) satisfies almost all the properties of an ideal average as 
laid down by Prof. Yule, 

(ii) is quite familiar to a layman and 


(sii) has very wide applications in statistical theory at large, 
it may be regarded as the best of all the averages. 

5.13. Limitations of Averages. In spite ofits very wide 
applications in statistical analysis, the averages have the following 
limitations : 


1, Since average is a single numerical figure representing the 
characteristics of a given distribution, proper care should be taken 
in interpreting its value otherwise it might lead to very misleading 
conclusions. In this context, it might be appropriate to quote a 
classical joke regarding average about a village school teacher who 
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had to cross a river along with his family. On enquiry he was given 
to understand that the average depth of the river was 3/feet, He 


wife, 2 daughters and 3 sons) and found that their average (mean) 
height was 3} feet. Since the average height of the family came 
Out to be higher than the average depth of the river, he ordered 
his family to cross the river. But when he reached the other side of 


‘Araba jyon ka tyon 
Kunba dooba kyon’ 


was as deep as 4 feet or зо. Accordingly, the members of the 
family with height below 4 feet were drowned, 


2. A proper and judicious choice of an average for a parti- 
cular problem is very important. A wrong choice of the average 
might give wrong and fallacious conclusions, 


bution. We might come across a number of distributions having 
the same average but differing widely in their structure and consti- 


Pd, , In certain types of distributions like U shaped or J-shaped 
distributions, an average (which is only a single point of concen- 
tration) fails to represent the entire series. [с. f. Chapter 4] 


5. Sometimes ап average might give very absurd results. 
For instance, the average of a family might come out in fractions 
which is obviously absurd, In this context we might quote the 
following : 


“The figure of 2'2 children per adult female is felt in some 
respects to be absurd and the Royal Commission suggested that the 
middle classes be paid money to increase the average to a rounder 
and more convenient number", 
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EXERCISE 5.4 


1. Define Geometric Mean and discuss its merits and demerits. Give 
two practical situations where you will recommend its use. 
2. (a) “It is said that the choice of an average depends on the particular 
problem in hand". 
Examine the above statement and give at least one instance each for the 
use of Mode and Geometric Mean. 
[Delhi U. B.Com. (Hons.) 1979 


(6) Discuss the strong and weak points of various measures of central 
tendency. 
(Lucknow U. B.Com. 1982) 


3. (а) What are the advantages and disadvantages of the chief averages 
used in Statistics ? Indicate their special uses if any. 
(Kurukshetra U. B.Com. Sept. 1980) 


b) What are the desiderata for a Statisfactory average ? Examine the 
geometric mean in the light of these desiderata and bring out the special proper- 
ties of this average which lend to its use in intercensal population counts and 
in the construction of index numbers. 

[4.1.М.А. (Diploma in Management) May 1978] 
4. (а) “Each average has its own Special features and it is difficult to 

say which one is the best". Explain and illustrate. 
[Punjab U. B.A, (Econ. Hons.) 1981] 


(b) Why is arithmetic mean generally preferred over median as the 
measure of central tendency ? "What is the relation between arithmetic mean 
and geometric mean ? When is the latter preferred over the former ? 

[Delhi U. B.A, (Econ. Hons.) 1979] 


5. Explain the relative merits of geometric mean over other measures 
of central tendency, 
[Osmania U. B.Com. (Hons.) April 1983] 


6. Give a specific example of an instance in which : 
(a) The median would be used in preference to arithmetic mean ; 
(b) The arithmetic mean would not beassatisfactory as the geometric 
Mean, and 
(c) Mode would be used in preference to the median, 
(Kerala Uni, B. Com., April 1977) 
7. (а) Find the goemetric mean of 1:05, 1°08 and 1:77. 
Ans. 1.26 p 
(b) Find the geomettic mean of the following : 


1, 7, 18, 65, 91 and 103. 
(Delhi U. B, Com. 1983) 


Ans, 20:62 ~ 
(c) Calculate Geometric Mean of the following data : 
1,7, 29, 92, 115 and 375. : r 
(Delhi Uni. B.Com. III, 1984) 

Ans.. 30°50. : 

(d) Monthly income of ten families of particular place is Biven below. 
Find out Geometric Mean : 

85, 70, 15, 75, 500, 8, 45, 250, 40, 36 

(Guru Nanak Dev. Uni. B. Com. II Sept, 1982) 


Ans. 5803. 
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8. (a) The price of a commodity doubles in a period of 5 years. What 
will be the average rate of increase per annum ? 


= PEER (Punjab Uni. B. Com. Sept. 1977) 
ns. 149% 


(b) If the population has doubled itself in twenty yeafs, is it correct to 
say that the rate of growth has been 5% per annum ? 
[Delhi U. В. A. (Econ. Hons.) 1980, '76] 
Ans. No. т=3°5%, 


9. The population of India in 1951 and 1961 were 361 and 439 Million 
respectively. 


(1) What was the average percentage increase per year during the 
period ? 


(ii) If the average rate of increase from 1961 to 1971 remains the same, 
what would be the population in 1971 ? 


(Guru Nanak Dev U. B. Com. Sept. 1981) 
Ans. (i) 2%, (11) 533-85 million. 


10. The population of a country increased by 20 percent in the first 
decade and by 30 per cent ia the second decade and by 45 per cent in the third 
decade. Determine the average decennial growth rate of population, 


[Delhi Uni. B.A. (Econ. Hons. I) 1983] 


и. The rates of the increase in population of a country during the 
last three decades are 10%, 20% and 30%. Find the average rate of growth 
during the last three decades, 

Ans. 19:896 


12, A machine depreciates by 40% in the first year, by 25% in the 
second year and by 10% per annum for the next three years, each percentage 
being calculated on the diminishing value. What is the average percentage of 
depreciation for the entire period ? 


Ans. 20%. [Punjab Uni. B. Com. April 1977) 
13. Aneconomy grows at the rate of 2% in the first year, 2'5% in the 
second year, 3% in the third, 4% in the fourth,...... and 1095 in the tenth year. 


What is the average rate of growth of the economy ? 
Ans. 5 690 p.a, [Delhi Uni, B.A, Eco. (Hons.) 19781 


14. If arithmetic mean and geometric mean of two values are 10 and 8 
respectively, find the values. 


Ans. 16, 4 


15. А тап gets three successive annual rises in salary of 20%, 30% and 

25% respectively, each percentage being reckoned in his salary at the end of the 

Previous year, How much better or worse would he have been if he had been 
given three annual rises of 25% each, reckoned in the same way. 

[I.C.W.A. (Intermediate) Feb. 1982} 


_ Ans, The man would be better in the second case by 0'31% of his 
starting salary in the Ist. year. 


16 Thegeometric mean of 4 items is 100 and of another 8 items is 
3:162. Find the geometric mean of the 12 items. 
[Delhi Uni. В.А. (Econ. Hons. 1) (0.C.), 1983] 
Ans, 10. 
17. (а) Define simple and weighted geometric mean of a given distri- 
bution. 


| 
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The weighted geometric mean of three numbers 229, 275 and 125 is 203. 
The weights for Ist and 2nd numbers are 2 and 4 respectively. Find the weight 
of the third. 

З (b) The weighted geometric mean of the four numbers 9, 25, 17 and 30 
is 15'3. If the weights of the first three numbers are 5,3 and 4 respectively, 
find the weight of the fourth number. 


Ans. 2 (approx.) 

18. (a) Define Harmonic Mean and discuss its merits and demerits. 
Under what situations would you recommend its use. 

(b) Find the geometric and harmonic mean from the following 


data. 
Items : 1 2 з 6876097 0605 ЗИ 
Value : 15 250 157 157 1°57 105:7 105 1'06 25'7 0:257 
[Punjab Uni. B.A, (Econ. Hons. II) April 1984] 


Ans. GM=16:04 ; HM--17637 


(c) Find out Harmonic Mean from tbe following. 
2574, 465, 75, 5, 08, 0 08, 0-005 00009 
(Guru Nanak Dev Uni. B. Com. II April 1982) 


Ans. 0:00604. 


. ,(d) What do you mean by weighted Harmonic Mean ? When will you 
use it instead of Simple Harmonic Mean ? Explain by a practical situation. 


19. Itissaid that the choice ofan average depends on the particular 
problem in hand. Comment on this and indicate at least one instance of the 


use of inedian, geometric mean and harmonic mean. 
(C.A. (Intermediate) Nov, 1972 (0.5.)) 

20. From the following statements select any two which are correct and 
any three which are incorrect. In respect of each of such statements selected by 
you, give your comments explaining briefly why you consider the statement 
correct or incorrect : 

(i) The median may be considered more typical than the mean because 
the median is not affected by the size of the extremes ; 

(ii) In a frequency distribution the true value of the mode cannot be 
calculated exactly ; 

(iii) The Geomeiric Mean cannot be used inthe averaging of index 
numbers because it gives undue importance to small numbers ; 


number cannot be shifted. with 


(iv) The base of any weighted index } 
t when the average used is 


accuracy without recalculating the entire series excep’ 
the Geometric Mean ; 

Ans. (i) T, (i) T, (iii) F, (iv) T, 
he sides of which ШЕТ 100 
e ae aspeed of 100 km. per hour first side, at 
km. each. The aeroplane covers at a spec See ешле те 


400 Кт. рег hour the fourth side. Use the corr 
speed around the square. x 
А (LC.W.A. (Intermediate) June 1978] 


Ans. 192 km. p.h. 
22. Aman climbs up a slope at a speed of 5 km. an hour and descends 
it ata speed of 8 km. per hour. 1f the distance covered each way is 10 km, 


find the average speed for the entire journey. 
[Punjab Uni. B.A. (Econ. Hons.) 1978) 


Ans. 6:154 km. p.h. 
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23. A railway train runs for 30 minutes at a speed of 40 miles an hour 

and then, because of repairs of the track runs for 10 minutes at a speed of 8 

miles an hour, after which it resumes its previous speed and'runs for 20 minutes 

except for a period of 2 minutes when it had to runoverá bridge with a speed 
of 30 miles per hour. What is the average speed ? 

(Punjab/'Uni. B, Com. Oct, 1977) 


Ans, 34:33 m. p.h. 

24. Acyclist covers his first three kilometres at an average speed of 8 
kms. per hour, another 2 kms at9 kms per hourand the last2 kms at 4 kms 
per hour. Find the average speed for the entire journey. 

(Sambalpur Uni. B. Com., 1982) 

Ans. 6:38 kms per hour. 


25. A man travels from Agra to Dehradun covering 204 miles at a 
mileage rate of 10 miles per gallon of petrol and via Ghaziabad with an ad- 
ditional journey of 40 miles at the rate of 15 miles per gallon. Find the average 
mileage per gallon. 

(Guru Nanak Dev Uni. B. Com. Sept. 1976) 


Ans, 10:58 miles per gallon. 


26. You have spent one rupee for 3 dozen oranges in une shop, another 
rupee for 4 dozen oranges and still another rupee for 5 dozen oranges in two 
other different stores, What is the “average price” per dozen of oranges ? 


(Kurukshetra Uni. B. Com. II, April 1982) 
Ans. Rs. 0724. 
27. If a person spends Rs, 60 on books priced Rs, 4 each, Rs. 60 on books 
priced Rs. 5 each and Rs. 60 on books priced Rs. 6 each, can it be said that the 


average price of the books purchased by him in Rs. 5 ? Why ? 
(Bombay Uni. B. Com. 1977) 


Ans. Rs. 4:87 (Weighted Harmonic Mean). 


28. Ina eertain office a letter is typed by 4 in 4 minutes. The same letter 
istyped by B, C and D in 5, 6, 10 minutes respectively, What is the average 
time taken in completing one letter? How many letters do you expect to be 
typed in one day comprising of 8 working hours. 


Ans, Н.М. = 5:58 minutes per letter. 


Letters typed in 8 hours (480 minutes) 80-86 


29. Define Arithmetic Mean, Harmonic Mean and Geometric Mean 
for a set of п observations and state the relationship between them. 


Ans, APGPH ; the sign of equality holds if and only if all the obser- 
vations are equal. 


30. Calculate the Arithmetic Mean, G.M. and H.M. of the following 
Observations and show that A.M.>G.M.>H.M. 


32, 35, 36, 37, 39, 41, 43. 
Ans, A.M. 37:57, G.M.=37°44, Н.М.= 37.23 
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Dispersion 


6.1. Introduction and Meaning. Averages or the measures _ 
of central tendency give us an idea of the concentration of the 
obervations about the central part of the distribution. In spite of 
their great utility in statistical analysis, they have their own limi- 
tations. If we are given only the average of a series of observations, 
we cannot form complete idea about the distribution since there 
may exist a number of distributions whose averages are same but 
which may differ widely from each other in a number of ways. The 
following example will illustrate this viewpoint. 


Let us consider the following three series 4, B and C of 9 


items each. 
Series Total Mean 


4 115,15, 15, 15, 15, 15,.15,15, 15... 135 - 15 
B  11,12,13,14, 15, 16, 17, 18, [9 135 15 
c 3, 6, 9, 12, 15, 18, 21, 24,97 135 15 


All the three series A, B, and C, have the same size (n—9) 
and same mean viz., 15. Thus, if we are given that the mean of a 
Series of 9 observations is 15 we cannot determine if we are talking 
of the series A, B, or C, In fact any series of 9 items with total 135 
will give mean 15. Thus we may have a large number of series with 
entirely different structures and compositions but having the same 
mean. 

From the above illustration it is obvious that the measures of 
central tendency are inadequate to describe the distribution com- 
pletely. In the words of Geoige Simpson and Fritz Kafka. 

“‘An average does not tell the full story. 1t is hardly fully repre- 
sentative of a mass unless we know the manner in which the individual 
items scatter around it. A further description of the series is necessary 
if we are to gauge how representative the average ia.” 


Thus the measures of central tendency must be supported 
and supplemented by some other measures, One such measure is 


Dispersion. 


mag 


308 Business Statistics 


Literal meaning of dispersion is ''Scatieredness; We study 
dispersion to have an idea ofthe homogeneity (compactness) or 
heterogeneity (scatter) of the distribution. In the above illustra- 
tion, we say that the series A is stationary, i.e, it is constant and 
shows no variability. Series В is slightly dispersed and series C is 
relatively more dispersed. We say that series His more homogene- 
ous (or uniform) as compared with series C or the series C is more 
heterogeneous than series В. 

We give below some definitions of dispersion as given by 


different statisticians from time to time. 


WHAT THEY SAY ABOUT DISPERSION—SOME 
DEFINITIONS 

"Dispersion $s the measure of the variation of the stems." 

—A.L. Bowley 


“Dispersion is a measure of the extent to which the individual 
items vary.” —L.R. Connor 
“Dispersion or spread is the degree of the scatter or variation 
of the variables about a central value.” 
—B.C, Brooks and W.F.L. Dick 
“The degree to which numerical data tend to spread about an 
average value is called the variation or dispersion of the data.” 
—Spiegel 
“The term dispersion is used to indicate the facts that within a 
given group, the items differ from one another in size or in other 
words, there {в lack of uniformity in their sizes," 
—W.I. King 


6.2. Characteristics for an Ideal Measure of Dispersion. 
The desiderata for an ideal measure of dispersion are the same as 
those for an ideal measure of central tendency, viz. : 
(i) It should be rigidly defined. 
($1) It should be easy to calculate and easy to understand. 
(i$i) It should be based on all the observations. 
(iv) It should be amenable to further mathematical treatment. 
(v) It should be affected as little as possible by fluctuations of 
sampling. 
(vi) It should not be affected much by extreme observations. 


All these properties have been explained in the chapter on 
Measures of Central Tendency. 


6.3. Absolute and Relative Measures of Dispersion. The 
measures of dispersion which are expressed in terms of the original 
units of a series are termed as Absolute Measuers. Such measures 
are not suitable for comparing the variability of the two distri- 
butions which are expressed in different units of measurement. On 
the other hand, Relative Measures of dispersion are obtained as 


` vatios or percentages and are thus pure numbers independent of the 
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units of measurement. For comparing the variability of the two dist- 
ributions (even if they are measured in the same units), we compute 
the relative measures of dispersion instead of absolute measures o£ 
dispersion. 


. 6.4. Measures of Dispersion. The various measures of 
dispersion are : 
(i) Range. 
($i) Quartile deviation or Semi-Interquartile range, 
(i31) Mean deviation. 
(iv) Standard deviation, 
(v) Lorenz curve. 


The first two measures viz., Range and Quartile deviation are 
termed as positional measures since they depend upon the values of 
the variable at particular position of the distribution, The last mea- 
sure viz., Lorenz curve is a graphic method of studying variability. 
In the following sections we shall discuss these measures 1п detail 
one by one. 


6.5. Range. The range is the simplest of all the measures 
of dispersion. Itis defined as the difference between the two ex- 


treme observations of the distribution. In other words, range із the ` 


difference between the greatest (maximum) and the smallest (mini- 
mum) observation of the distribution. Thus 


Range=Xmas—Xmin (6.1) 


where Xmas is the greatest observation and Хон» is the smallest 
observation of the variable value. 


In case of the grouped frequency distribution (for discrete 
values) or the continuous frequency distribution, range is defined as 
the difference between the upper limit of the highest class and the 
lower limit of the smallest class, 


Remarks 1. In case of a frequency distribution, the frequen- 
cies of the various variate values (or classes) are immaterial since 
range depends only on the two extreme observations. 


2. Absolute and Relative Measures of Range. Range as defined 
in (6.1) is an absolute measure of dispersion and depends upon the 
units of measurement, Thus if we want to compare the variability 
of two or more distributions with the same units of measurement, we 
may use (6.1). However, to compare the variability of the distribu- 
tions given in different units of measurement we cannot use (6.1) 
but we need a relative measure which is independent of the units of 
measurement. This absolute measure, called the coefficient of range, 
is defined as follows : 


: _ Xmaz—Xmin 
Coefficient of Range= EU Xr ... (6:2) 


^ 
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where the symbolshave already been explained. In other words, 
coefficient of range is the ratio of the difference between two extreme 
observations (biggest and smallest) of the distribution to their eum. 


It is a common practice to use coefficient of range even for the 
Comparison of variability of distributions given in the same units of 
measurement. 

6.5.1 Merits and Demerits of Range. Range is the simplest 
though crude measure of dispersion, It is rigidly defined, readily 
comprehensible and is Perhaps the easiest to compute, requiring 
very little calculations, However, it does not satisfy the properties 
(48%) to (vi) for an ideal measure of dispersion. We give below its 
limitations and drawbacks, 


(i) Range is not based on the entire set of data. It is based 
only one two extreme observations, which themselves are subject to 


change fluctuations, As such range cannot be regarded asa reli- 
able measure of variability. 


(5i) Range is very much affected by fluctuations of sampling. 
Its value varies very widely from sample to sample. 


Я (i51) If the smallest and the largest observation of a distribution 
are unaltered and all other values are replaced by a set of observa- 
tions within these values б.е. Xmas and Xni, the range of the distri- 
bution remains same. Moreover if any item is added or delected on 
either side of the extreme value, the value of the range is changed 
considerably, though its effect is not so Pronounced if we use the 
coefficient of range. Thus range does not take into account the 
composition of the series or the distribution of the observations 
within the extreme values. Consequently, it is fairly unreliable as a 
measure of dispersion of the values within the distribution. 


(iv) Range cannot be used ifwe are dealing with open end 
classes. 


(v) Range is not suitable for mathematical treatment. 


(vi) Another short-coming of the range, though less important 
is that it is very sensitive to the size of the sample. As the sample 
size increases the range tends to increase though not propor- 
tionately. 


(vii) In the words of W. I. King “‘Range is too indefinite to be 
used as а practical measure of dispersion.” 


6.5.2 Uses. (1) In spite of the above limitations and _ Short- 
comings range, as a measure of dispersion, has its applications in 
a number of fields where the data have small variations like the 
stock market fluctuations, the variations in money rates and rate 
of exchange. 


(2) Range is used in industry for the statistical quality control 
of the manufactured product by the construction of R-chart Фе, 
the control chart for range. 
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.. (3) Range is by far the most widely used measure of varia- 
bility in our day-to-day life, For example, the answer to problems 
like, ‘daily sales in a departmental store’ ; monthly wages of workers 
in a factory’ or ‘the expected return of fruits from an orchard’, is 
usually provided by the probable limits—in the form of a range. 


(4) Range is also used as a very convenient measure by 
meteorological department for weather forecasts since the public is 
Primarily interested to know the limits within which the tempe- 
rature is likely to vary on a particular day. 


Example 6.1. Calculate the range and the coefficient of range of 
A's monthly earnings for a year. 


Months Monthly earnings Months Monthly earnings 


(Rs.) (Rs.) 
1 139 7 160 
2 150 8 161 
3 151 9 162 
4 I51 10 162 
б 157 11 178 
6 158 12 175 


Solution, 
Largest earnings Rs. 175 ; Smallest earnings=Rs. 139 
Rangez175—139— Rs. 36 
А 175—139 36 _ 
Coefficient of range= 757739 ^ 314 0.115 

Example 6.2. The following table gives the age distribution of 
а group of 50 individuals. 

Agelin years) : 16-20 21-25 26-30 31—85 

No, of persons : 10 15 17 8 

Calculate range and the coefficient of range. 

Solution, Since age is a continuous variable we should first 
convert the given classes into continuous classes. The first class 
will then become 15.5—20.5 and the last class will become 
30:5— 35.5. 

Largest value=35.5; Smallest value=15°5 


En Range=35.5—15°5=20 years 
: 355—155 920 _ 
Coefficient of range—-3: 5 L19:5 ^ 5] 0.39 


4 Example 6.3. Find the interquartile range of the following 
ata : 
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Class 
Interval: 0—15 15—80 30—45 45—60 60—75 75—90 90—105 
Er 8 26 30 45 20 17 4 
(Bombay U. B.Com. Nov, 1982) 
Solution. 
COMPUTATION OF QUARTILES 
(Less than) 
Class Interval f c. f. 
EM uL E MR EU EE ccs 
0—15 8 8 
15—30 26 34 
30—45 30 64 
45—60 45 109 
60—75 20 129 
75-99 17 146 
90—105 4 150 
NEC a ИКЕ SER OY RE лм d o ruri roD 
Total 150 


, Here N/4—37.5. The c.f. just greater than 37.5 is 64. Hence 
Q; lies in the corresponding class 30—45, 


Q,-14- +(t-0 )=30+ al 37.5—34 ) 


-304 ZŠ 2304 175-3175 


3N/4—112.5. The cf. just greater than 112.5 is 129. Hence Qs 
lies in the corresponding class 60—75, 


: $ / 3N 15 ) 
oe = | — — = ates „51—09 
©з=1+ = 4 e ) 60+ 26 (112.51 0 
=60+ 222 —604-2.625—62.625 


-'. Interquartile range Q,— Q,—62.625—31.75—30.875 


6.6. Quartile Deviation or Semi Inter Quartile Range. 
It is a measure of dispersion based on the upper quartile Qs and the 
lower quartile Q,. 


Inter-Quartile Range=Q,- Q, ...(6.3) 


_ , Quartile deviation is obtained from inter-quartile range оп 
соса by 2 and hence із also known аз semi inter-quartile range. 
us 


E 
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Quartile Deviation (Q, р.) 9-9. ...(6-4) 


: Q.D. as defined in (6'5)is only an absolute measure of dis- 
persion. For comparative studies of variability of two distributions 
we need a relative measure which is known as Coefficient of 
Quartile Deviation and is given by : 


Coefficient of Q р.=-@—@)/2__ 9—0: 46. 
(0.79112 — +0; We: 
Remarks. 1. The quartile deviation gives the average amo- 
unt by which the two quartiles differ from median. For a symme- 
trical distribution we have (c.f. chapter 7) 


Q,—Mi-Md-Q, > ма= $9. 09) 


$.е., median lies half way on the scale from Q, to Qs. Thus fora 
symmetrical distribution we have : 


Q.D.-tQ,- 85% 49, = Atl — ag [From (9) 


ad — Q-QD.- 9-979. 9 € wi [From (9) - 


In other words, for a symmetrical distribution we have : 
Q,— Md—Q.D. and Q,— Md-- Q.D. (tt) 


Since in a distribution 25% of the observations lie below Q, 
and 25% observations lie above Q, 50% ofthe observations lie 
between a and Q,. Therefore, using (**) we conclude that for a 
symmetrical distribution Md+Q, D. covers exactly 50% of the 
observations. 


2. Rigorously speaking quartile deviation is only a positional” 
average and does not exhibit any scatter around an average. 
such some statisticians prefer to call it a measure of partition rather 
than a measure of dispersion. 

66.1. Merits and Demerits of Quartile Deviation. 
Merits. Quartile deviation is quite easy to understand and calcu- 
‘late. It has a number of obvious advantages over range as а meas- 
ure of dispersion. For example : 

(a) As against range which was based on two observations 
only, Q.D. makes use of 50% of the data and as such is obviously 
a better measure than range. 

(b) Since Q.D. ignores 25% of the data from the beginning of 
the distribution and another 25% of the data from the top end, 
it is not affected at all by extreme observations, 


(c) Q.D. can be computed from the frequency distribution. 
with open end classes. In fact Q.D. is the only measure of disper- 
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sion which can be obtained while dealing with a distribution having 
open end classes. 


Demerits. (i) Q.D. is not based on all the observations since 
it ignores 25% of the data at the lower end and 25% of the data at 
the upper end of the distribution. Hence, it cannot be regarded as 
a reliable measure of variability. 

(ii) Q.D. is affected considerably by fluctuations of sampling, 

(isi) Q.D. is not suitable for further mathematical treatment. 


Thus quartile deviation is not a reliable measure of variability, 
particularly for distributions in which the variation is considerable. 


6.7. Percentile range. This is a measure of dispersion 
based on the difference between certain percentiles, If P; is the 


i^ percentile and P; is the j^ percentile then the so-called #3 
percentile range is given by : 
i-j Percentile Range— P;— Pi, (i<j) ... (6.6) 
Thus i-j Semi Percentile Range és given by : 
(Pi—Pi)/2, (i<j) --.(6.7) 


The commonly used percentile range is the one which corres- 
ponds to the 10^ and 90% percentiles. Thus taking i=10 апа j=90 
in (6.7), we get : 

10-90 Percentile Range Py,— P, -..(6.8) 
and 10-90 Semi-Percentile Range—(P,,— P,,)/2 ..-(6.8а) 


The above measures are absolute measures only. The relative 
measure of variability based on percentiles is given by : 


Coefficient of 10-90 percentile re ш 
= Рь—Р, 10 
Poot Pig 

Theoretically, 10-90 percentile range should serve as a better 


measure of dispersion than Q.D. since it is based on 80% of the 
data, However, in practice it is not commonly used. 


...(6.9) 


Example 6.4 Evaluate an appropriate measure of dispersion 
for the following data : т T 
Income (in Rs.) : 
Less than 50 50—70 70—90 90—110 110—130 130—150 above 150 
0. of persons : 
54 100 140 300 230 125 51 
(Calicut О. В. Com. 1975) 


- Solution. Since we are given the classes with open end 
intervals, theonly measure of dispersion that we can compute is 
the quartile deviation, 


——————— 
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COMPUTATION OF QUARTILE DEVIATION 


Income No. of persons Less than 
(in Rs.) (f) (c. f.) 
Less than 50 $4 54 
50-70 100 154 
70—90 140 294 
90—110 300 594 
110— 130 230 824 
130 - 150 125 949 
above 150 51 1000 
Неге N 71000, za = 1000 =250, St 750 


Since c.f. just greater than 250 is 294, Q, (first quartile) lies in 
the corresponding class 70—90. Similarly, since c.f. just greater 
Шен 750 is 824, the corresponding class 110—130 contains Q;. 

ence : 


$—7 15 (250—154)=70+ 2° =70+13.714=83.714 
20 2x 156 
= Sedat —594)— = 13.565 
Q,=110-+ 230790 594)=110-+ —55 1104- 
=123 565 
2. Quartile deviation is given by : 
= 123-565—83. 
о.р.=® 2d 123 565 83 214. 89851 10005 


Example 6.5. From the following data, calculate the “percent. 
age’ of workers getting wages : 


(a) more than Rs. 44, 
(b) between Rs, 22 and Rs. 58, and 
(c) the quartile deviation 


Wages (Re.) : 0—10 10—20 20—30 30—40 40—50 50—60 

No. of workers: 20 45 85 160 70 55 

60—70 70—80 
0 


35 b 
[C.A. (Intermediate) May 1976] 


Solution. Parts (a) and (b) have been done in Example 5.27 
in previous chapter 5, There we also obtained ; 


Q;—27.06, Q;=49.29 
—Q,. 49.29—27:06 f 
Gp t= 2 2, 0929 72706 — EP nas 
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EXERCISE 6.1 


1. (а) Show how measures of dispersion help in explaining that though 
frequency distributions may have the same values of their averages they may 
fer in their respective formations, In what respect are measures of dipersion 

of use in Statistics ? 

(b) “Frequency distributions may either differ in the numerical size of 
their averages though not necessarily in their formations or they may have the 
same values of their averages yet differ in their respective formations,” 

Explain and illustrate how the measures of dispersion afford a supple- 
ment to the information about the frequency distributions given by the 
averages. 

(c) Discuss the validity of the following statement : “Ап average, when 
Published, should be accompanied by a measure of dispersion, for significant 
interpretation,” 

2. From the moathly income of 10 families given below, calculate 
(a) the median, (b) the geometric mean, (c) the coefficient of range. 


SINON I О SNS ON 52:6 m8. 9. 10 
Income in Rs, 145 367 268 73 185 619 280 115 870 315 
Ans, (a) Md=Rs, 274 (b) G=Rs. 252.4 (d) Coeffi, of Range=0.84 


3. The index numbers of prices of cotton and coal shares in 1972 were 
as under— 


Index number Index number 
Month of prices of of prices of 

cotton shares coal shares 
January 188 131 
February 178 130 
March 173 130 
April 164 129 
May 172 129 
June 183 120 
July 184 127 
August 185 127 
September 211 130 
October 217 137 
November 232 140 
December 240 142 


Calculate range for each share, Hence discuss which share do you con- 
sider more variable in price. 

Ans. Range (Cotton)=76, Ceofficient of Range (Cotton)=0"19 

Range (Coal)=22, Ceofficient of Range (Coal)=0'084 

Cotton shares are more variable in prices, 


4. Age distribution of 200 employees of a firm is given below : Construct 
&'less than' ogive curve, and hence or otherwise calculate semi-interquartile 


range E of the distribution : 


Dispersion : 20317 

Age in years (less than): 25 30 35 40 45 50 55 

No. of employces SIM 25 75 130 170 189 200 
(Bombay Uni. B. Com, May 1980) 

Q-Q 
2 


5. Calculate the qartile deviation and its coefficient from the following 


Ans. Q,—33:5 years, Q,=43 years, =4°75 years 


data : 
Class Interval Frequency Class interval Frequency 
10-15 E 30—40 10 
15—20 12 40-50 8 
20—25 16 50-60 6 
25-30 22 60—70 4 
[С.А. (Intermediate) Nov. 1976] 
Ans. Q.D.=8'05 ; Coefficient of Q.D.—0:2733 
6. Compute the Co-efficient of Quartile Deviation of the following 
data: 
Size Frequency Size Frequency 
4857 6 24—28 12 
8—12 10. 28—32 10 
12—16 18 32—36 6 
16—20 30 36—40 2 
20—24 15 
(Delhi Uni. B. Com. (External) 1982] 
Ans. Qı=14'5, Q,—24:92, Coeffi. of Q.D,=0:2643 
PN 7. Calculate the appropriate measure of dispersion from the following 
ata : 
Wages in Rs. No. of wage Wages in Rs. No. of wage j 
per week earners per week earners 
Less than 35 14 41—43 18 
35—37 62 over 43 T 
38—40 99 


Ans : Coeff. of Q.D.=0°046 

8. Find out middle 50%, middle 80% and Co-efficien t of Q.D. from the 
following table : 

Size of items : 2 4 6 8 10 12 

Frequency : 3 5 10 12 6 4 

Ans. Quartile range=4 ; Percentile range=8, Coefficient of Q.D.—0:25 


"m 9. Calculate the coefficient of quartile deviation for the following 
ata : 


Wages (in Rs.) No. of labourers Wages (in Rs.) No. of labourers 


60—64 12 76—80 12 
64-68 18 80—84 8 
68—72 16 84—88 8 
72-76 14 


(Andhra Pradesh Uni. B. Com., Oct. 1976) 
dns. Q.D.=5°89 [Q; 56622, О, =78)] 
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6.8. Mean Deviation or Average Deviation, As already 
pointed out, the two measures of dispersion discussed so far viz, 
range and quartile deviation are not based on all the observations 
and also they do not exhibit any scatter of the observations from an 
average and thus completely ignore the composition of the series. 
Mean Deviation or the Average Deviation overcomes both these draws 
backs. As the name Suggests, this measure of dispersion is obtained 
on taking the average (arithmetic mean) of the deviations of the 
given values from a measure of central tendency. According to 
Clark and Schkade : 


* Average deviation is the average amount of scatter of the itema 
in a distribution from either the mean or the median, ignoring the 
signa of the deviations, The average that is taken of the scatter is ањ 
arithmetic mean, which accounts for the fact that this measure is often 
called the mean deviation.” 

6.8.1, Computation of Mean Deviation, If X, X,,..., Xo 
are n given observations then the mean deviation (M.D.) about an 
average A, say, is given by : 


M.D. (about an average A)= LI | ХА] 
Zld] ...(6-10) 


where | d | = | Х—А | read as mod (X— А) is the modulus value 
or absolute value of the deviation (after ignoring the negative sign) 
d=X—A and 5 | d | is the sum of these absolute deviations and 4 
ТУ one of the averages Mean (M), Median (Md) and Mode 
(Mo). 


Steps (i) Calculate the average 4 of the distribution by the 
usual methods. 

(4i) Take the deviation d— X — A of each observation from the 
average A, 


(ii$) Ignore the negative signs of the deviations, taking all the 
deviations to be positive to obtain the absolute deviations, 


Idi21|X—A4]. 
(sv) Obtain the sum of the absolute deviations obtained in 
step (isi). 


(v) Divide the total obtained in step (iv) by n, the number of 
observations. The result gives the value of the mean deviation about 
the average A. 


In the case of frequency distribution or grouped or continuous 
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frequency distribution, mean deviation about an average A is given 
у: 


M.D. (about the average А)=-Ъ Zf | X—A | 


1 ; 
=y 27141 (6.11) 


frequency and |X—Aj is the absolute value of the deviation 
d =(X—A) of the given values of X from the average 4 (Mean, 
Median or Mode). 

Steps : (i), (ii) and (iii) are same as given above. 


(iv) Multiply the absolute deviations | d | = | X—A | by the 
corresponding frequency f to get f | а | 


(v) Take the total of products in steps (év) to obtain а: 
(vi) Divide the total in step (v) by N, the total frequency. 


The resulting value is the mean deviation about the ave- 
rage A, 

Remarks 1. Usually, we obtain the mean deviation (M.D.) 
about any one of the three averages mean (M), median (Md) or 
mode (Mo). Thus 


M.D. (about mean)=—- X| xX—M | 
M.D. (about median) —-3f | X—Ma | 16.19) 
M.D. (about mode) -- Bf | Х— Мо | 


2. The sum of the absolute deviations (after ignoring the 
signs) of a given set of observations is minimum when taken about 
median, Hence mean deviation is minimum when it ба calculated 
from median, In other words, mean deviation calculated about 
median will be less than mean deviation about mean or mode, 


3. As already pointed out in remark 1 above, usually, we com- 
pute the mean deviation about any one of the three averages mean, 
median or mode. But since mode is generally ill defined, in practice 
M.D. is computed about mean or median, F urther, as a choice 
between mean and median theoretically, median should be preferred 
since M.D. is minimum when calculated about median (c.f. remark 
2). But because of wide applications of mean in Statistics as a 

easure of central tendency, in practice mean deviation is generally 
puted from mean. 
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5, Fora symmetrical distribution the range Mean+M.D. 
(about mean) or Md--M.D. (about median), ['.' M—Md fora 
symmetrical distribution] covers 57.592 of the observations of the 
distribution. If the distribution is moderately (skewed) the range 
will cover approximately 57.5% of the observations, 


6.8.2. Short-cut Method of Computing Mean Deviation, 
For computing the mean deviation first of all we have to calculate 
the m about which we want the mean deviation by the 


methods discussed in the previous chapter. In case the average is 


2 whole number, the method of computing mean deviation by 
formulae (6.10), (6.11), or (6.12) is quite convenient. But if the 


and time consuming and requires lot of algebraic calculations. In 


M.D. (about Mean)= xl Zf[X —a|-- (M —a)(Zfn—2f4) ] ++ (6.13) 
where 

М is the mean, 

@ is the arbitrary constant near the mean, 


Ef» is the sum of all the class frequencies before and including 
the mean value, 
and 2f, is the sum of all the class frequencies after the mean value, 


Similarly using the short cut method, 
M.D. (about median) 


x | X—a | t (Md—aY(Zfs' — zl --.(6.13a) 


where now, Md is the median, a is some arbitrary constant near the 
median, Sfx’ is the sum of the class frequencies before and includ- 
ing the median value, and Efa’ is the sum of the class frequencies 
after the median value. 


Remarks 1. Obviously, 


ZfabZXfa—N -..(6.14) 

2ув=М Ууд 

ог Zfa—N —Xfs 
Similarly Zfs' -N — fa ..-(6.14а) 


————— QUU 


am CUN 


Ч 
| 
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2. The above formulae (6.13) and (6 13a) are true provided all 
the values of the variable which are above the average (M or Md) 
are also above ‘a’ and those which are below the average are also 
below ‘a’. The arbitrary constant ‘a’ should be taken some arbi- 
trary integral value near the average value, i.e., it should be a value 
in the average class. The short cut method will not yield correct 
result if ‘a’ їв taken outside the average class. 


6.8.3. Merits and Demerits of Mean Deviation 


Merits : (i) Mean deviation is rigidly defined and is easy to 
understand and calculate, 

(it) Mean deviation is based on all the observations and is 
thus definitely a better measure of dispersion than the range and 
quartile deviation. 


(її) The averaging of the absolute deviations from an 
average irons out the irregularities in the distribution and thus 
mean deviation provides an accurate and true measure of dis- 
persion, 


(iv) As compared with standard deviation (discussed in next 
article § 6°9) it is less affected by extreme observations. 


(v) Since mean deviation is based on the deviations about an 
average, it provides a better measure for comparison about the for- 
mation of different distributions, 


Demerits. (i) The strongest objection against mean devia- 
tion is that while computing its value we take the absolute value of 
the deviations about an average and ignore the signs of the 
deviations, 


(ŝi) The step of ignoring the signs of the deviations is mathe- 
matically unsound and illogical. It creates artificiality and renders 
mean deviation useless for further mathematical treatment, This 
drawback necessitates the requirement of another measure of vari- 
ability which, in addtion to being based on all the observations is 
also amenable to further algebraic manipulations. 


(iii) Itis not a satisfactory measure when taken about mode 
or while dealing with a fairly skewed distribution. As already 
pointed out, theoretically mean deviation gives the best result when 
itis calculated about median. But median is not а satisfactory 
measure when the distribution has great variations, 


(iv) It is rarely used in sociological studies. 


(v) It can not be computed for distributions with open end 
classes, 
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6.8.4, Uses. In spite of its mathematical drawbacks, mean 
deviation has found favour with economists and business statis- 
ticians because of its simplicity, accuracy and the fact that standard 
deviations (discussed in § 6.9) gives greater weightage to the devia- 
tions of extreme observations. Mean deviation is frequently useful 
in computing the distribution of personal wealth in a community 
or a nation since for this, extremely rich as well as extremely poor 
people should be taken into consideration. Regarding the practical 
utility of mean deviation as a measure of variability it may be 
worthwhile to quote that in the studies relating to forecasting busi- 
ness cycles, the National Bureau of Economic Research has found 
that the mean deviation is most practical measure of dispersion to 
use for this purpose. 


6.8.5. Relative Measures of Mean Deviation, The 
measures of mean deviation as defined in (6.10), (6.11) and (6.12) 
are absolute measures depending on the units of measurement, The 
relative measure of dispersion, called the coefficient of mean deviation 
is given Ьу: 


Coefficient of M.D, 


Mean Deviation 


™ Average about which it is calculated 16:15) 
Thus : 
Coefficient of M.D. about mean— DEDE ...(6.15а) 
Меап 
Coefficient of M.D. about теіап=-М:Р: ...(6.15Ь) 
Мейіап 


The coefficients of mean deviation defined іп (6:15), (6.15а) 
and (6.15b) are pure numbers independent of the units of measure- 
ment and are useful for comparing the variability of different 
distributions. 


Example 6.6. Calculate the mean. deviation from mean for the 
following data. 


Class Interval 2—4 4—6 6—8 8—10 
Frequency : 8 4 2 1 
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Solution, 
COMPUTATION OF MEAN AND M.D. FROM MEAN 


CRM ке f|x-X| 


M.D. about mean 1-27 | X— РЕ -—310 7 48 


4—6 


hoo 


6 UU 


* Last column f | d | is not required for this method. It is 
needed for the short cut method given below. 


Aliter. Short-cut Method. We can use the deviations from 
arbitrary point a=5 directly to compute the M.D. from mean with« 
out computing the values | Х— | . This is particularly useful when 
X is in fractions (decimals) in which case the usual formula is quite 
laborious. For this we need the last column £14] .[o.f. (6.13), 
§6.8.2]. Using the formula (6.13), we get : 


M. D. about mean- Z f| d | --(X—a)(Zfs—Zfi) ] 
where : 


Zfs=Sum of all the class frequencies before and including the 
mean class ќ.е., the class in which mean lies. 


Here mean is 5 2 and it lies in the class 4—6. 
Zfs44-3—7 


2f4=Sum of all the class frequencies after the mean class 
=2+1=3 


ог 2fa=N—Zfs=10—7=3, 
"^. M.D. about x 144-(5.2— Bc 


Remark, The value obtained by the short-cut method 
coincides with the value obtained by the direct method and rightly 


324 Business Statistics 


so because the arbitrary point A=5 is near the mean value 5.2 
(c.f. Remark 2, $6.8.2). 


Example 6.7. Calculate the value of co-efficient of mean 
deviation ( from median) of the following data : 


Marks Мо, of Students Marks No. of Students 


10—20 2 50—60 25 
20— 30 6 60—70 20 
30—40 12 70—80 10 
40—50 18 80—90 7 


(Delhi U. B. Com. 1982) 
Solution, 


CALCULATIONS FOR MEAN DEVIATION FROM MEDIAN 


€f. |14| = | хма] 
= [0—34 | 


39°8 796 
29:8 178:8 
19:8 237°6 

9:8 176'4 

02 50 
10:2 2040 
20:2 2020 
30:2 2114 


Zf |d| 
Mida 


We have N/2—50. с. f.just greater than 50 is 63, Hence the 
Corresponding class viz,, 50—60 is the median class. 


2 


= 504 XU 50.149. 548 


OSA GST ne TT AN ) 
Median=/+ +( o)=50+ 25 ( »o 38 


M.D. about median — Ef|z—HMd| => У/|4| 


1294.8_ 2 
= Foo —12.948—12.95 


ЗМИИ 
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Hence the Coefficient of median deviation (from median) is : 


M.D. about Md 12:95 
^ Mediau 548 25968 


Example 6'8. Calculate mean deviation from median from the 
following data : 


Marks less than: 80 70 60 50 40 30 20 10 
No. of Students : 100 90 80 60 32 20 13 5 
[Delhi U. В. Com. (Hons.) 1973] 


Solution, First of all we shall convert the given cumulative 
frequency distribution table into ordinary frequency distribution as 
given in the following table : 


COMPUTATION OF M.D. FROM MEDIAN 


zf | X--Md 
"| awe 


Here N/2—50. Since the c.f. just greater than 50 is 60, the 
corresponding class 40—50 is the median class. 
ь( т = ES dug ) 
Ma=1+ H+ o ) 40+ 3° ( 50—32 
=40+6.43=46.43 

1 1428.6 

M.D. about Md= w | X—Md | = 00 
= 14.286 14.29 


Aliter. Since median value comes out to be in fractions, we 
can do the above question conveniently by the short-cut method i.e. 
by taking the deviations from any arbitary point a=45, (say), lying 
in the median class. 
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M.D, BY SHORT-CUT METHOD 


Using formula (6.13a) we get : 


M.D. about May | st 141 +Ma—a Zf zf | 


where 
2fs’=Sum of thc frequencies before and including the median 
value viz., 46°3 
=5+8+7+12+28=60 
Zfa' =N —Ef»' —100—60—40 


^. MD- about Md— wl 1400-1-(46.43—45) (60—40) ] 


Spa 

= al 1400+ 1.43 x20 | 
ue 

BET [ 1400+28.6 ] 


= 1428.6 . 
SEF 14.286 14.29 
Remark. As in the last question, the values of M.D. obtain- 
ed by the direct method and the short-cut method are same, since 
the arbitrary value a is taken in the median class. 


EXERCISE 6.2 
1. What do you mean by ‘mean deviation’. Discuss its relative merits 
over range and quartile deviation asa measure of dispersion. Also point out 
its limitations. 
2. Calculate mean deviation about A.M. from the following : 
Values (x) : 10 11 12 13 
Frequency (f) : 3 12 18 12 
U.C. A. (Intermediate), Dec, 1983] 


Ans. A.M.—11.87 ; M.D.=0.71 
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3. Calculate the mean deviation of the series : 


[Delhi U. B. Com. (External) 1981] 
Ans, M.D. (about median) 2:22 


.4. Compute the quartile deviation and mean deviation from the 
following data, 


Height in inches No. of students Height in inches No. of students 


58 15 63 22 
59 20 64 22 
ot 5 & E 
62 33 — 


(Punjab University B. Com. II, Sept. 1981) 
Ans. Q.D.=1°5 ; M.D. (about median)—1:73 


5. With median as the base calculate the mean deviation and compare 
the variablity of the two series A and B. 


Series A Series B Series A Series B 
3484 487 5624 408 
4572 508 4388 266 
4124 620 3680 186 
3682 382 4308 


218 
[C.4. (Intermediate.) Nov. 1973] 


Ans. Series А: Мӣ=4216; M.D.=490°25 ; Coeff. of M.D.- 0:116 
Series B : Md=395 ; M.D. = 121:38 ; Coeff. of M.D. 0:307. Series B is more 
variable. 

. 6. Campare the dispersion of the following series by using the co- 
efficient of mean deviation. 
Аве (years) : 16 17 18 19 20 21 22 23 24 Тоа 
No. ofboys : 4 5 7. 12 20 13 5 0 4 70 


No. ofgirls : 2 0 4 8 157-10 6 3 2 50 
(Saurashtra U. B. Com. 1975) 


Ans, Coeff, of M.D. about median (Боуз)=0"0685 
Coeff. of M.D. about median (girls)e0'0630 
AI 7. Find Mean Deviation from median of the distribution given 
low: 
No. ofaccidents :0 1 2 3 4 5 6 7 8 9 10 tk 12 


Persons having said 
No. of accidents : 15 16 21 10178 4 212 2— 2 


Ans. M.D. (from median) 21:96 

8 Find out mean deviation and its co-efficient from median from. 
the following series : 

Size of items : 4 6 8 10 р 14 16 


Frequency : 2 1 3 3 1 
(Guru Nanak Dev Uni. B. Com. II April 1982) 


Ans, 2:4;0:24 
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9. Calculate the mean deviation of the following series from the 
mean: 


Monthly Wages No. of Monthly Wages No. of 

of Workers Workers of Workers Workers 
(in Rs.) (in Rs.) 

200—250 7 500—550 25 
250—300 13 550—600 10 
300—350 15 600—650 8 
350 —400 24 650—700 6 
400—450 36 700—750 4 
450—500 50 750 and above 2 


[Delhi Uni. B. Com. 1976) 
Ans, Mean-452:25 ; M.D.=86°387 


10. Calculate mean deviation from median from the following 


data : 

Class interval (f) Class interval (f) 
20—25 6 50—55 10 
25—30 12 55—60 8 
30—40 17 60—70 5 
40—45 30 70—80 2 
45—50 10 


Also calculate coefficient of mean deviation. 
(Delhi Uni. B. Com. 1977) 


Ans, 8:75 ; 0:206 


11. Calculate the coefficient of mean deviation from mean and median 
from the following data : 


Class : 0—10 10—20 20—30 30—40 40—50 50—60 60—70 70—80 
Frequency : 18 16 15 12 10 5 2 2 
[Delhi Uni. B. Com. (Hons.) 1976] 
Ans, 56:309, ; 61-659. 


12. Compute the mean deviation from the median and from mean for 
the following distribution of the scores of 50 college students. 


Score Frequency Score Frequency 
140—150 4 170 —180 10 
150—160 6 180 —190 9 
160—170 10 190—200 3 


(Kurukshetra U. B. Com, 1980 3 Mysore Uni. B. Com, April 1981) 
Ans, 10:24; 10°56 


13- Calculate Mean Deviation from Median from the following data : 


Wages in Rs, 
(Mid-Value) 2 395, 175 225 275 325 
No. of persons : 3 8 21 6 2 


[Delhi Uni. B. Com. (External) 1978] 
Ans. Mediane221:43 ; M.D. (Median) =31-607 
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14, Find the mean deviation around the median : 


Size Frequency 
land up to10 1 
1 and up to 20 3 
1 and up to 30 6 
1 and up to 40 8 
1 and up to 50 10 
Ans. 10:334 


6.9. Standard Deviation. Standard deviation, usually 
denoted by the letter c (small sigma) of the Greek alphabet was 
first suggested by Karl Pearson as a measure of dispersion in 1893. 
It is defined as the positive square root of the arithmetic mean of the 
squares of the deviations of the given observations from their arit- 
thmetic mean. Thus if Ху, XY,....X» is a set of n observations then 
its standard deviation is given by : 


prs Lyx} .. (6-16) 
N n 


where X= x, ...(6.16а) 
is the arithmetic mean of the given values. 


Computation of Standard Deviation. 

Steps : (i) Compute the arithmetic mean X by the formula 
(6.16a). 

(ii) Compute the deviation (X—X) of each observation from 
arithmetic mean $.e., obtain X, —X, X,—X,..., Xn—X, 

(iii) Square each of the deviations obtained in step (33) i.e., 
compute (X,—X)*, (X,— X)*,..., (Xn—X)*. 

(iv) Find the sum of the squared deviations in step (iii) given 
by: 


Z(X—Xj—(X,— X)24- (X, — X)14-...-(X»— X) 
(v) Divide this sum in step (iv) by n to obtain 
-L их) 


(vi) Take the positive square root of the value obtained in 
step (v). 

(vit) The resulting value gives standard deviation of the 
distribution, 

Computation of Standard Deviation in Case of Fre- 
quency Distribution. In case of frequency distribution, the stan- 
dard deviation is given by : 
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с= | ps Zf(X—X) (6.17) 


where X is the value of the variable or the mid-value of the class 
(in case of grouped or continuous frequency distributions) ; f is the 
Corresponding frequency of the value X ; N—- Xf, is the total fre. 
quency and. 


AN 
=F EX se (6-17a) 


is the arithmetic mean of the distribution, 


Steps. (i) Compute X by the formula (6°17a) or the usual step 
deviation formula discussed in the previous chapter. 


(ii) Compute deviations (X—X) for each value of the 
variable. 

(ii) Obtain the Squares of the deviations obtained in step (ii) 
ñe., compute (X—X), 


(iv) Multiply each of the Squared deviations by the corres- 
ponding frequency to get f (X — X)*, 

(v) Find the sum of the values obtained in step (iv) to get 

Zf(X—X)., 

(vi) Divide the sum ebtained in step (v) by N,the total 
frequency, 

(vi?) The positive Square root of the value obtained in step 
(vi) gives the standard deviation of the distribution, 


Remarks. 1. It may be pointed out that although mean 
deviation could be calculated about any one of the averages (M, Md 
or Mo), standard deviation is always computed about arithmetic 
mean. 

2. To be more precise, the standard deviation of the variable 
X will be denoted by ox. This notation will be useful when we 
have to deal with the standard deviation of two or more variables. 


3. Standard deviation abbreviated as S.D. or s.d. is always 
taken as the positive Square root in (6'16) or (6:17). 


4. The value of s.d. dcpends on the numerical value of the 
deviations (X,—X), (X,—2),.,.,|Xn—X), “Thus the value of o will 
be greater if the values of X are scattered widely away from the 
mean. Thus a small value ofc will imply that the distribution is 
homogeneous and a large value of c will imply that it is hetero- 
geneous. In particular s.d. is zero if each of the deviations is zero 
б.е, o=Oif and only if, the variable assumes the constant value i.e. 


o=0 iff, X, X,—X,—...— Y.—k (constant) 
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6.9.1. Merits and Demerits of Standard Deviation 


Merits. Standard deviation is by far the most important and 
widely used measure of dispersion. It is rigidly defined and based 
on all the observations. The squaring of the deviations (X—X) 
removes the draw back of ignoring the signs of deviations in com- 
puting the mean diviation. This step renders it suitable for further 
mathematical treatment. The variance of the pooled (combined) 
series is givin by formula (6.31). 


More-over, of all the measures of dispersion, standard devia- 
tion is affected least by fluctuations of sampling. 


Thus, we see that standard deviation satisfies almost all the 
properties laid down for an ideal measure of dispersion except for 
the general nature of extracting the square root which is not readily 
comprehensible for a non-mathematical person. It may also be 
pointed out that standard deviation gives greater weight to extreme 
values and as such has not found favour with economists or busi- 
nessmen who are more interested in the results of the modal class. 
Taking into consideration the pros and cons and also the wide 
applications of standard deviation in satistical theory, such as in 
Skewness, kurtosis, correlation and regression analysis, sampling 
theory and tests of significance, we may regard standard deviation 
as the best and the most powerful measure of dispersion, 


Remark. s.d.& Range 


6.9.2. Variance and Mean Scquare Deviation. According 
to William I Greenwald the variance is the mean of the squared 
deviations about the mean of a series. Thus, variance is the square 
of the standard deviation and is denoted by o*. For a frequency 


distribution variance is given by : 
so XU - Ry ...(6-18) 
where the symbols have already been explained in (6°17). 
The mean square deviation, usually denoted by г? is defined as 
"= + sf(X—A)? ...(6.19) 
where 4 is any arbitrary number. 


The square root of the mean square deviation is called root 
mean square deviation and given by : 


s= | = EA (6.20) 
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Relation Between с? and s*. We have 
82а? ...(6.22) 
In other words, mean square deviation is not less than the vari- 
ance or the root mean square deviation {в not less than the standard 
deviation. 
The sign of equality will hold in (6-22) i.e., s*— c? if and 
only if a 


Thus а? will be least when Y=A. Hence mean square deviation or 
equivalently root mean square deviation is least when deviations 
are taken from the arithmetic mean and variance (standard devia- 
tion) is the minimum value of mean square deviation (root mean 
square deviation). 

6.9.3. Different Formulae for Calculating Variance. By 
definition, the variance of the random variable X denoted by o or 
more precisely by ox’, is given by 


ext ХХ) ...(6.23) 


where 
Ef—N, is the total frequency, 

If X is not a whole number but comes out to be in fractions, 
the computation of ox? by the above formula becomes very cumber- 
some and time consuming. In order to overcome this difficulty we 
shall develop different versions of the formula (6.23) which reduce 
the arithmetical calculations to a great extent and are very useful 
for numerical computation of standard deviation. 


On expanding the square in (6.23) we get 
1 1 * 
a= yx-( wx ) ...(6.24) 


(6.24) is а much convenient form to use than the formula (6.23). 
But if the values of X and f are large, the computation of fX, JX? із 
quite time consuming. In that case we use the step-deviation 
method in which we take the deviations of the Biven values of X 
from any arbitrary point A, Generally, ‘A’ is taken to be a value 
lying in the middle part of the distribution, although the formula 
obtained below holds for any value of A. Let 


d—X—A, 
denote the deviations of the given value X from А, Then we have, 


са? = yin ~(+ Ifd у 06:25) 


=з ee ey ee ee Ne ee » — 
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Right hand side in (6.25) is same as the R H.S. of (6-24) 
with 2 replaced by d and thus represents variance of d i.e, сд? 
Hence we get 


0; =o? ...(6.25@) 
This leads to the following important conclusion : 
“The variance and consequently the standard deviation of a 
distribution (a independent of the change of origin’’. 
Thus, if we add (subtract) a constant to (from) each observa- 
tion of the series, its variance remains same. 


In case of grouped or continuous frequency distribution it is 
convenient to change the scale also, Thus if his the magnitude of 
the class interval (or if A is the common factor in the values of the 
variable X) then we may take 


4-34 > x 4-M NU 
and finally get : 
„мәш ГАЛЛ J- 2 2206.26) 
с2=№ [ye (+ zu) мо ( 


which shows that variance (or s.d ) is not tndependent of change of 
scale. 


Combining the results obtained in (6.25) and (6.26) we con- 
clude that : 


“Variance or standard deviation ia independent of the change of 
origin but not of the scale”. 


Remarks 1. For numerical problems, a some-what more 
convenient form of (6.26) may be used. Rewriting (6.26) we get 


s xl Уууё—(ууду | (8.27) 
PU Bye y wae on |! з ...(6.27a) 
> oz=h, са «-.(6.276) 


2. If we are given X and a? then we can obtain the values of 
=X and ЕХ? as discussed below. 


For n observations X;, Xo, ..., Xn we have 


х1 х = ZX—aX (6.08) 
n 

and fee zxxíi— X? [ From (6.24) ] 
Mh 


> ot Dip > XXX (oth) (6:28 a) 
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Formulae (6.28) and (6.28a) are very useful when we are 
given the values of the mean and standard deviation (or variance) 


Example 6.9. Calculate the standard deviation of the follow. 
ing observations on a certain variable : 


240-12, 240.18, 240.15, 240.12, 240.17, 
240.15, 240.17, 240.16, 240'92, 24021, 
[I.C.W.A. (Intermediate) June 1976) 
Solution, 


COMPUTATION OF STANDARD DEVIATION 


2401.60 


2X 
-i = 040.16 


1 0.0106 
в%= — Xr ya 
wx Tem —0.00106 


Hence standard deviation is given by 


9=(0-00106)t=Antilog [ log (0.00106) 
=Antilog [ $ (3:0253)] 
=Antilong [ } (~3+0.0253)|=Antilog [4(—2-9747)] 


=Antilog (—1.4173)=Antilog (2.5127) 
=0.03256 
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Example 6-10. Calculate the mean and standard deviation 
from the following data. 


Value : 90—99 80—89 fea 60—69 50—59 40—49 nc 
Frequency : 12 20 14 4 
W.A. (Intermediate) December n 


Solution, 
CALCULATIONS FOR MEAN AND S.D. 


Class Mid-Value Frequency d x—645 fd Ја? 
(х) (Р) ez 
90—99 94:5 2 3 6 18 
80—89 84:5 12 2 24 48 
70—79 745 22 1 22 22 
60—69 645 20 0 0 0 
50—59 54:5 14 =! —14 14 
40—49 44-5 4 -2 — 8 16 
30—39 34:5 1 —3 -3 9 
Total . N=75 5/4=27 5/4%=127 


Mean=4+ U4 64. s 2X2 


=64.5+3.6=68.1 


8.D. =h. ү -(-BY— 10 


=10 X 4/1.6933— 0.1296 —10 x + 1:5637 
=10х 1.2505— 12.505 


Example 6.11. Find the standard deviation of the foliowing 
distribution : 


Age: 20—25 25—30 30—35 35—40 40—45 45—50 
No, of persona : 170 T10 80 — 5 40 35 
(Take assumed average=32.5) 
[Delhi U. B.A. (Econ. Hons.) 1982] 
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Solution, 
CALCULATIONS FOR STANDARD DEVIATION 
Class Mid-Age No. of persons x—32:5 fd Ja 
[5 (7) RENT 
20—25 22:5 170 —2 —340 680 
25—30 275 110 —1 —110 110 
30—35 325 80 0 0 
35—40 375 45 1 45 45 
40—45 425 40 2 80 160 
45—50 475 35 3 105 315 
Total N=480 3fd=—220 5/4? 
=1310 
шы Т Ce ee 
Xf ү Ууд үз oe —220 ï 
resp M ( N ) 5X А/ 80 ( 480 


75x V/2.7292— 0.2101 —5 x 4/2.5192 


=5Х 1.5872—7.936 


Example 6.12. A charitable organisation decided to give old 
age pensions to people over sixty years of але. The scale of pensions 
were fixed as follow : 


Age group 60—65 Rs. 20 per month 
T 65— 70 Rs. 25 per month 
Hs 70—75 Rs. 80 per month 
» 75—80 Ёз, 36 per month 
DA 80—85 Re, 40 per month 


The ages of 25 persons who secured the pensions right are as 
» given below : 
74, 62, 84, 72, 61, 83, 72, 81, 64, 71, 63, 61, 60, 
67, 74, 64, 79. 73, 75, 76, 69, 68, 78, 66, 67 


i d 
Calculate the monthly average pensions payable per person an 
the standard deviation. [Delhi Uni. B. Com, (Hons.) 1975] 
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Solution. First of all we shall prepare the frequency distri- 
bution of the 25 persons with respect to agein the age-groups 
60—65, 65—70, ..., 80—85 (as suggested by the above data) by 
using the method of tally marks. Then we shall compute the arith 
metic mean of the pension payable per person and also its standard 
deviation, as explained in the following table. 


COMPUTATION OF MEAN AND STANDARD DEVIATION 
L Re Mte tM ae 


Age Tall; Frequenc, Monthl; X-— 30 П 
Group ars i f) 3 pension #7 Rs. Ош; 5 Aa 
„——————=———==——== 

60—65 Ш 7 20 —2 —14 28 
65—70 Ш 5 25 1 225006 
70—75 И! 6 30 0 0 0 
75—80 111 4 35 1 4 4 
80—85 il 3 40 2 6 12 


Average monthly pension is : 
324 fd зор 5Х\—9)- =30—1.80=Rs. 28.20 
N 25 
Standard deviation of monthly pension is : 
= zw o- W yi (Уул) 
och у 7 ( та IY 


1 
xo 25 x 49—(—9)? =——\/1225—81 


SOSA аа : Antilog ( ов 1144) 


= ——Antilog (+ x 3-0585 )= Апо (1.5292) 
1 
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Example 6.13, Find the number of items lying within mean 
+o of the following distribution, 


Class Frequency Class Frequency 
11—12 ó 21—22 395 
18—14 426 23—24 88 
15—16 720 25—26 8 
17—18 741 27—28 ó 
19— 20 665 29—30 7 

: [I.C.W.A. (Intermediate) Dec 1982] 
Solution. 
COMPUTATION OF MEAN AND S.D. 
Class interval Mid-Value de Х—195 7 fa fa 
(X) 2 
11—12 11's —4 5 —20 80 
13—14 13-5 —3 426 —1278 3834 
15—16 15°5 —2 720 —1440 2880 
17—18 17:5 —1 741 —74 741 
19—20 19-5 0 665 0 0 
21—22 21°5 1 395 395 395 
23—24 23.5 2 38 76 152 
25—26 25:5 3 8 24 72 
27—28 275 4 5 20 
29—30 295 5 


f= Zfd= Уул: 


3010 —2929 8409 
һуа 2x (—2929) 
M EU Eig: =19.5—1.946=17.55 
cans A+ "XT =19.5+ ae 19.5—1.946 


o=} 5- za )—°х / = = ar) 

=2x /2.7937—(0.9731)2 =2 x V/2.7937— 0.9469. 
=2x УТ 8464—2 x 1.35897—2.71794:2.72 
Mean-Eo—17.55--2.72—20.27 and 14.83 

The number of items lying within Meana i,e., within 14.83 


and 20.27 is 7204-741 +665=2126, and the proportion of items 
lying within Меап-+с is 


2126 5 
"8010 =0:7063 i.e., 70.63 % 


ome 


ув: 
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Example 6.14. The mean and standard deviation of the fre- 
quency distribution of a continuous random variable X are 40.604 Ibs 
and 7.92 lbs respectively. The distribution after change of origin and 
scale {в as follows : 

d: —8.—2.—1:10 ТОЗОТТО 

fae 3 15 45 57 50 36 2 9 240 
where d=(X—A)/h and f is the frequency of X. Determine the actual 
class intervals, 


Solution. 


| COMPUTATION OF MEAN AND S.D. 


d f fa fa? 

-3 3 29 27 

-2 15 —30 60 

-1 45 —45 45 

А 0 57 0 0 
1 50 50 50 
2 36 72 144 

3 25 75 225 

4 9 36 144 

Мез/= 240 5/4=149 5/4%=695 


suc eet ee eU Шайт сц у. 
We are given : 
X=40.604 and 0:=7.92 
We know that 
Ја 
N 
149 


еВ 2149 Y 440.621 h NO 
> 40-604=4+h (245 ) A+0-62 


NENES 
oi] -F 
= 792-4 [£95 (29 у =h 2.8958 —(0 6208) 


240 


—14/2.8958—0.3854 =h 2.5104 =1.5844 h 


| EY 
n ~ 75844 


Ў=А+ 


= 4.99875 
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Substituting in (*) we get: 
A=40.604—5 x 0.621=40.604—3.105 
—37-4992:37.5 
The value 
d=0 = X—4—0 = X—4—375 


Since the magnitude of the class interval is A=5, the bound- 
aries of the Corresponding class are (37.5—2.5, 37-5+2.5) i.e., 
(35, 40). Thus the actual frequency distribution is as follows : 


X=Mid-value 


of Class Class Interval Frequency 


Example 6.15, Complete a table showing the frequencies with 
which words of different numbers of letters occur in the extract repro- 
duced below (omitting punctuation marks) treating as the variable the 
number of letters in each word, and obtain the mean and standard 
deviation of the distribution : 


"Her eyes were blue : blue as autumn. distance—blue aa the blue 
we see, between the retreating mouldings of hills and woody slopes on 
а sunny September morning: а misty and shady blue, that had no 
beginning or surface, and was looked into rather than at," 


Solution. Here we take the variable (X) as the number of 
} letters in each word in the extract given above. We find that in the 
extract given above there are words with number of letters ranging 
from 1 to 10, Hence the variable X takes the values from 1 to 10. 
e frequency distribution is easily obtained by using ‘tally marks’ 

as follows : 


+= 
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0—1 226 ЕА ИЫ NI 
No. of letter: Freq d—X—4A d 2 
ina Word (X) А i) =X—6 Э de Ке 
тта IAM LIU EARN 

1 2 65 —10 50 2 

5 A dp —32 128 10 

3 9 =з —27 81 19 

4 10 —2 —20 40 29 

5 5 ES! — a 5 34 

6 4 0 0 0 38 

7 3 1 3 3 41 

8 1 2 2 4 42 

9 3 3 9 27 45 

10 1 4 4 16 46 

Total f= 46 Zfd=—76 /4%=354 
—————— ЙИ E E d rr MERE 
Mean=4+ -L— Zfd=6+ (© )=6—1.65=4.35 
ао xfdai— E Zf4 || 
N N 
ait —(—1 65)1—7.6956—2.7225— 4.9731 


> o=/49731=2.23 


Example 6.16. The arithmetic mean and standard deviation of 
series of 20 stems were caloulated by a student as 20 cm. and 5 cm. 
respectively, But while calculating them an item 13 was misread as 
30. Find the correct arithmetic mean and standard deviation. 

[Delhi Uni, В, Com. (Hons). 1972] 


Solution. We are given n—20, Y=20 cms, с=5 cms 
Wrong value used—30 ; Correct value —13 


We bave 
EX =nX=20X 920—400 
ZX3—n(c?4- Y?) —20(25-- 400) —8500 
If the wrong observation 30 is replaced by the correct value 
13 then the number of observations remaíns same viz., 20 and 


Corrected EX=400—30+13=383 
Corrected 2X?=8500—(30)?+ (13)2=7769 


Corrected mean= zd =19.15 
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Correctedo?= 


С, 2 
corrected EX’ —(Corrected mean)? 


77 
= 1755-—(19.15)+=388-43—366°72 


=21.73 
Corrected 977V/21.73—4-6615 


Example 6.17. For a frequency distribution of marks $n Socio- 
logy of 200 candidates (grouped in intervuls 0—5, 5—10,...... ete.) the 
mean and the standard deviation was found to be 45 and 15. Later it 
was discovered that the score 58 was misread as 63 in obtaining the fre- 
quency distribution, Find the correct mean and standard deviation 
corresponding to the correct frequency distribution. 

[Z.0. W.A. (Intermediale) Dec. 1981] 


Solution. We are given n=200, X—45 and с=15. These 
values have been obtained on using the wrong value 63 whilethe 
correct value ís 53. In case of grouped or continuous frequency 
distribution, the value of X used for computing the mean and the 
standard deviation, is the mid-value of the class interval, Since the 
wrong value 63 lies in the interval 60—65 with the mid-value 62.5 
and the correct value 53 lies in the interval 50—55 with mid-value 
52.5, the question amounts to finding the correct values of Y and c 
ifthe wrong value 62.5 is replaeed by the correct value 52.5. 

We have N —200, X—45, с=15 
Wrong value used —62.5 ; Correct value=52.5 
ZfX =NX=200 x 45=9000 
2fX*= N(o?-- X1)200 (1524-452)— 200(2254-2025) 
=200 x 22502450000 
Corrected EfX —9000 —62.54-52.5—8990 
Corrected ZfX?=450000—(62.5)2+ (52.5) 
72450000 —[(62.5)?— (52-5)*] 
7450000 — (62.5-1- 52.5) (62.5—52.5) 
7450000 —115x 10=450000—1150 


—448850 
: Corrected ZfX 8990 
«. Corrected mean N = ==44.95 
Corrected o= Corrected Б —(Corrected mean}? 


w=, | 338850 — (qa as) = 226825 2000.50 


= 223.75=14.96 
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EXERCISE 6.3 
1. Whatis dispersion? Explain what you understand by absolute and 
relative dispersion, Descrite some of the measures of relative dispersion known 
to you. 
(b) “The standard deviation is the most precise and the most satisfac- 
tory measure of dispersion." Explain this statement, by comparing the standar 


deviation with other measures of dispersion. 
(Bombay Uni. B. Com. 1975) 


2. Compare standard deviation with mean absolute deviation as a 
measure of dispersion. What are the relative measures based on these two ? 
(Bombay Uni. B. Com. Nov. 1982) 


3. Whatare the chief requisites of a good measure of dispersion ? In 
the light of those, comment on some of the well-known measures of disper- 


sion. 
(Kurukshetra Uni. B. Com. II, Sep. 1982) 
4. Explain why standard deviation is regarded superior to other 
measures of dispersion ? 
[Punjab Uni. B. A. (Econ. Hons. II), April 1983] 
5. What do you understand by absolute and relative measures of dis- 
persion ? Explain advantages of the relative measures over the absolute measures 


of dispersion. 
(Bombay Uni. B. Com. May 1980) 


6. Define mean deviation and standard deviation. Explain, why econo- 

mists prefer mean deviation to standard deviation in their analysis ? 
(Madurai Uni. M.A, (Econ.) 1976] 

1. What is standard deviation ? Explain its superiority over other 
measures of dispersion. 

8, State giving reasons whether the following statements are true or 
false : 

(i) Standard deviation can never be negative. 


(ii) The sum of squared deviations measured from mean is least. 
(Delhi U. B. Com. 1983) 


Ans. (i) True (ii) True 


9. Calculate the standard deviation from the following data : 


Marks in Cost No. of Marks in Cost No. of 
Accounting Students Accounting Students. 


0—10 5 40—50 9 
10—20 7 50—60 6 
20—30 14 60—70 2 


30—40 12 
(Osmania Uni. B. Com. III, April, 1984) 


Ans. s.d.—15:57 
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10. Calculate standard deviation of the following series, 


Weekly Wages of No. of Weekly Wages of No, of 
Workers (In Rs.) Workers Workers (in Rs) Workers 


100 —105 200 130—135 410 
105—110 210 135—140 320 
110—115 230 140—145 280 
115—120 320 145—150 210 
120—125 350 150—155 160 
125—130 520 155—160 90 


(Delhi Uni. B. Com. 1975] 
Ans. s.d.— 14-244 

11, Find out the mean and standard deviation of the following data, 
Ageunder : 10 20 30 40 50 60 70 80 


No. of persons 
dying > 15 30 53 75 100 110 115 125 
(Nagarjuna U. B. Com, April, 1980) 


Ans. Mean-35'16 years, s.d. 19.76 years. 


12. In the following data, two class frequencies ате missing. 


C.I. Frequency CT. Frequency 
100—110 4 150—160 = 
110—120 7 160—170 16 
120—130 15 170 —180 10 
130—140 ps 180 —190 6 
140-150 40 190—200 3 


However, it was possible to ascertain that the total mumber of 
frequencies was 150 and that the median has been correctly found out 
а 146:25. You are required to find with the help of information 
given : 

(i) The two missing frequencies. 

(И) Having found the missing frequencies calculate Arithmetic Mean. 

and Standard Deviation. 

(il) Without using the direct formula find the value of mode. 

ТС.А. (Intermediate) May 19731 

Ans. (i) 24, 25 (ti) A.M.—147.33, s.d. 19.2 (iti) Mode= 14409 


13. The following table gi istributi i olds 
bedos ҮЛЕ шн le gives the distribution of income of househ 


Income Percentage of Income Percentage of 
(Rs.) households (Rs.) households 
Under 100 7 500—599 14:9 
100—199 117 600—699 104 
200—299 121 700—999 9'0 
300—399 14:8 1000 and above 40 


400—499 15:9 
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(i) What are the problems involved in computing standard deviation 
from the above data ? 
(ii) Compute a suitable measure of dispersion, 
[Delhi Uni. B.A. Econ. (Hons.), 1978] 
Ans. (ti) Compute Quartile Deviation. Q.D.—169:425 ; 
Coeffi. of О. "404 


14. (а) The standard deviation calculated from a set of 32 observations 
is 5. If the sum of the observations is 80, what is the sum of the squares of these 
observations ? U.C.W.A. Final Jan. 1971 (0.5.)] 

Ans. EX*=1000 

(b) The mean of 200 items is 48 and their standard deviation is 3. 
Find the sum and sum of squares of all items. 

[Delhi Uni. B. Com. (Hons.), 1978] 


Ans. 9,600 ; 4,62,600 

(с) Given: 
No. of observations (N)=100 
Arithmetic average ( Y )=2 
Standard deviation (34)=4 


Find ZX and ХХ". 
т К [Delhi Uni. В.А. (Econ, Hons. I) (О.С.), 1983) 
Ans. ®Х=200, EX*=2000 


15. An association doing charity work decided to give old age pen- 
р to people of 60 years and above їп age. The scales of pension were fixed as 
ollows : 
Age group 60—65 ; Rs. 40 per month 
Age group 65—70 ; Rs. 50 per month 
Age group 70—75 ; Rs. 60 per month 
Age group 75—80 ; Rs. 70 per month 
Age group 80—85 ; Rs. 80 per month 
85 and above ; Rs. 100 per month. 
The ages of 30 persons who secured the pension right are given below : 


62. 65.5 68 72. 715. 771 "8201-85 77000678 
75 61 60 68 72 76 78 79 80 82 
68 75 94 98 73 77 68 65 71 89 
б Calculate the monthly average pension payable and the standard devi- 
ation, 


Ans, Average репзіоп= Ез. 67.67 ; s.d. —Rs. 18.38 


16, Treating the number of letters in each word in the ЖЕ passage 
as the variable X, prepare the frequency distribution table and obtain its mean, 
median, mode and standard deviation. 


“The reliability of data must always be examined before any attempt is 
made to base conclusions upon them. This is true of all data, but particularly 
so of numerical data, which do not carry their quality written large on them, It 
is a waste of time to apply the refined theoretical methods of Statistics to data 
which are suspect from the beginning.” 


Ans. Mean=4'565, Median=4, Mode=3, S.D.—2:673. 
17. А collar manufacturer is considering the production of a new style 


of collar to attract young men. The following statistics of new circumferences 
are available based on measurements of a typical group : 
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Mid-value No. of Mid-value No. of 
(їп inches) students (in inches) students 
120 4 14-5 18 
12:5 8 150 10 
13:0 13 ` 15:5 7 
13:5 20 16:0 Б) 

140 25 


Use the criterion Х+3в to obtain the largest and the smallest size of 
collar he should make in order to meet the needs of practically all his customers 
having in mind that collars are worn on an average 3/4 inches larger than neck 
size. 


(Delhi Uni. M.B.A. April 1976) 
Hint. Mean=13-968” ; S.D. — 0:964", 
Limits for collar size are given by : [Mean+3 s.d.)+3/4 
Largest collar Size-:14*718-2:892—17:61^ 5 
Smallest collar 8ize—14:718—2:892—11:826". 


(18. Calculate the arithmetic mean and the standard deviation of the 
values of the world's annual gold output (in millions of pounds) for 20 different 
Years : 

94 95 96 93 87 79 73 69 68 67 
78 82 83 89 95 .103 108 117 130 97 

Also calculate the percentage of cases lying outside the mean at distances 
za, +29, +30, where с denotes the standard deviation. 

Ans. Mean=90°15, 3.d.—15:99 approx, 


Percentage lying outside Mx0, M+20, M+3c is given by 

T A 
30-5100, rx 100, -* 100 £.e., 35%, 5% and 0% respectively. 
Hint. M+o=106'14, 74°16. The number of observations lying outside 


these limits is 7 out of 20. Hence required percentages z- x 100=35 


19, The following data represent the percentage impurities in a certain 
chemical substance, Р к ce 


Percentage of Frequency Percentage of Frequency 
impurities impurities 
Less than 5 0 ` 10—109 45 
5—59 1 11—11-9 30 
6—69 6 12—129 5 
as A 13—139 3 
—9 5 14—14-9 1 
9—99 85 
280 


(i) Calculate the standard deviation. 

(ii) Find-the number of frequency lying between (A.M. +2 S.D.) 
[LCW.A, (Intermediate), Pec. 1983] 

Anf (10) 13924 (ii) 267 
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20. (a) The following distributi i 
aud els Де оок ng distribution was obtained by а change of origin 


d: —4 —3 —2 -1 0 1 2 3 4 
f: 4 8 14 18 20 14 10 6 6 
,. Write down the frequency distribution of X if it is given that mean and 
Varlance are 59:5 and 413 respectively. (Bombay Uni. B, Com. Nov. 1975) 
Ат. CI. / сл. ЕА 
15'5—25-5 4 655—755 14 
25:5—35:5 8 75-5—85-5 10 
35°5—45-5 14 85:5—95:5 6 
455—555 18 95:5—105:5 6 
55'5—65:5 20 
Total 100 


(6) Mean and standard deviation of the following continuous series are 
ЕЕ 15'9 respectively. The distribution after taking step deviation is as 
follows : 


dx: —3 -2 -l 0 1 2 3 
hie 10 15 25 25 10 10 5 


Determine the actual class intervals, 
[Delhi Uni. B. Com. (Hons.) 1974] 
Ans, 0—10, 10—20, 20—30, 30—40, 40—50, 50—60, 60—70. 


21. (a) The mean and standard deviation of a sample of 100 observa- 
tions were calculated as 40 and 5°} respectively by a student who took by 
mistake 50 instead of 40 for one observation. Calculate the correct mean and 
standard deviation. U/.C.\W.A (Intermediate) June 1982) 


Ans, Corrected mean=39°9 and s.d. —5. 


(b) For a number of 51 observations, the arithmetiz mean and standard 
deviation are 58:5 and 11 respectively. It was found after the calculations were 
made that one of the observations recorded as 15 wasincorrect, Find the 
standard deviation of the 50 observations if this ineorrect observation is 
omitted. [I.C.W. A, (Intermediate) Dec., 1980] 


Ans, o=921 

(c) The analysis of the results of a budget survey of 150 families gave an 
average monthly expenditure of Rs. 120 on food items with a standard deviation 
Rs. 15. After the analysis was completed it was noted that the figure recorded 
for one household was wrongly taken as Rs. 15 instead of Rs. 105. Determine 
the correct value of the average expenditure and its standard deviation, 

^ [С.А. (Final) Nov. 1978) 

Ans. Corrected mean=120'6 ; Corrected s.d.—12:4 

22, The mean and standard deviation of 20 items are found to be 10 and 
2 respectively. At the time of checking it was found that one item 8 was 
incorrect. Calculate the mean and standard deviation if 

(0 the wrong item is omitted, and 

(ii) it is replaced by 12 (LC.W.A, (Intermediate) Dec. 1977) 

Ans. (I) Mean=10°1053, s.d. 1:997 ; (If) Mean= 10:2, s.d, 1:99. 

23. A study of the of 100 film stars grouped in intervals of 10—1 
12—14,...etc,, revealed the шаш age and standard deviation to be 32 02 an 
and 13:18 respectively. While checking it was discovered that the age 57 was 
misread as27. Calculate the correct mean age and standard deviation. 

[Punjab Uni. M.A, (Econ.) 1973] 

Ans. Mean=32'32 ; s.d.—13:402 
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24. The mean and standard deviation of a set of 100 observations were 
found to be 40 and 12 respectively. On checking it was found that 2 observations 
were wrongly taken as 23 and 15 in place of 43 and 18. Calculate the correct 
mean and standard deviation. 


Ans, Ў=40°23; o=11°82. 
25, Fill in the blanks : 
(i) Algebraic sum of deviations is zero from...... 
(ii) The sum of absolute deviations is minimum from...... 
(їй) Standard deviation is always... than range 
(iv) Standard deviation is always ... than mean deviation. 
(у) The mean and s.d. of 100 observations are 50 and 10 respectively. 
The new : 
(а) Mean=......, s.d. e......, if 2 is added to each observation. 
(6) Mean=.... , if 3 is subtracted from each observation. 

(c) Mean=...... ‚ if each observation is multiplied by 5. 

(d) If 2 is subtracted from each observation and then it is divided by 5. 

(vi) Variance is the......value of mean square deviation. 

(vii) If Q,=10, Q,—40, the coefficient of quartile deviation is...... 

(viii) If 25% of the items in a distribution are less than 10 and 25% are 
more than 40, the quartile deviation із...... 

(ix) The median and s.d. of a distribution are 15 and 5 respectively. If 
each it m is increased by 5, the new median=......and s.d. —..-..- 

(x) A computer showed that the s.d. of 40 observations ranging from 
120 to 150 ів 35. The answer is Correct/wrong. Tick right one. 

Ans. (1) Arithmetic mean (i?) Median (iil) Less (iv) Greater 
(P) (a) 52,10 (5) 47,10 (с) 250,50 (d) 9:62 (iv) Minimum (vii) 0°6 (viii) 0'6 
(x) 20,5 (x) Wrong, since s.d. can't exceed range. 

610. Standard Deviation of the Combined Series. Аз 
already pointed out, standard deviation is suitable for algebraic 
manipulations i e. if we are given the averages, the sizes and the 
standard deviations of a number of series, then we can obtain the 
Standard deviation of the resultant series obtained on combining the 
different series. Thus i£ 

Gis Os ...› Ok are the standard deviations ; 

Ty Xa 5, Xn are the arithmetic means ; 
and т, m, ..., ns are the sizes, 


of Ё series respectively, then the standard deviation c of the com-- 
bined series of size N=n,+-n,-+...4-ne is given by the formula 


Not*—n(o,4-d*)--n4(0--d)4-...--m(mt--dg) — ...(629) 


where de —id,- X, X, di Xy Y (6:29 a) 
and gam n asp ...(6:29 8) 


914-15 --...- n» 
is the mean of the combined series, 
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Thus the standard deviation of the combined series is given 


y: 
m [setti teat ae tlt EH ys 
Бе 701 EIS T dg) T ess nibo? d di] 


--. (6.30; 
nk ng... (650) 
In particular, for two series we get from (6°29) : 
(my+n,)o? =n, (0, 4-52) -n,(o,--d,2) (6-31) 
where 


dum Rog, hath n 39 


+» ndn, 
ыр. P. X bm, n (X,—X,) 
ad =X, peg, ы, D 


Rewriting (6°31) and substituting the values of d, and d, 
we get: 


(nidmjo- mono, + ъи 


ndn, 
ge mto? Hna? чыст 
и оша 


Thus for two series, the formula (6-31a) can be used with con- 
venience, since all the values are already given. 

Example 6.18. The means of two samples of sizes 50 and 100 
respectively are 54.1 and 50 3 and the standard deviations are 8 and 7. 
Obtain the standard deviation of the sample of size 150 obtained by 
combining the two samples, 

Г.С... A. (Intermediate) Dec, 1977) 


Solution. In the usual notations we are given : 


,=50, mp=100, Y,—54.1, X,—50.3. o,=8, 0,=7. The mean 
X of the combined sample of size 150 obtained on pooling the two 
samples is given by ; 


pe п. т, 50х54.14-100х50.3 


mtn © 50+ 100 
2705+5030 7735 
eh ee 


d,—X,— X—54.10—51.57—2.53 
d,—X,—X—50.30—51:57— —1.97 


Hence the variance o? of the combind sample of size 150 is 
given by : 
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(т) о2==т;(,24-4,2)--п, (6,2-2) 
> 1506*— 50[8? 4- (2.53)2]-+100[72-+ (—1.27)2] 
=50(64+6-3504) + 100(49-1- 1-6129) 
=3517.52+5061.29=8578.81 
›__ 8578-81 
gi 92 
150 
Ж 9—v57.1921— 7.5625 
Example 6.19. For a group containing 100 observations, the 
arithmetic mean and the standard deviation are 8 and Yy 19.5 respec- 
tively. For 50 observations selected from these 100 observations, the 
mean and standard deviation are 10 and 2 respectively. Calculate 
values of the mean and standard deviation for the other half. 
[Delhi U. B.A. (Econ. Hone. I) 1984 ; 
Bombay Uni. B. Com. May 1980] 
Solution. In the usual notations we are given: 
n=100, #=8, o= 4/10.5 > в%= 10.5 ; 
n,=50, 3,—10, 0;=2; п,=100—50=50 
We want $, and o,. 


=57-1921 


же Mt nk, e g= 50x 10--50x x, 

Ny +n, 100 

>  800—500-r50z, 

=> 50¥%,=800—500=300 

300 

ага 
d,=%,—%=10—8=2 > qid 
d,—3,—5—6—8-—2 =  dg-à4 


(n--n4)o* n, (0,2--412)--п, (6,2+4,?) 


> — 100x10.5—50 (44-4)4-50 (o,3--4) 

> 1050— 400--506,2-- 200 

= 505,*—1050—600—450 

> o= = =9 > o,=3, 


since s.d. is always positive. 
ample 6.20. Calculate the standard deviation of the combined 


group of 500 items from the following data : 

Group I Croup II Group IIT 
No. of items : 100 150 250 
Arithmetic mean, c NT 55 60 
Variance : 100 121 144 


(Bombay U. B. Com. May 1982) 
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Solution. In the usual notations, we are given : 
т1= 100, »,—150, n4—250, 
%,=50, аг: 00 
9,*—100, o,2=121, с,2=144, 
ihe caddie. mean of the combined group of 500 items is 
given by : 


z= MX 5-13, 


ж-Еп»-Е 
100 x 50+150 x 554-250 x 60 
z 1004- 1504-250 
50004-82504-15000 28250 
e 500 500 


The variance c* ofthe combined series of 150 items is given 


by: 
1 
Tapes LM ®'+®%+т Gite en, аа] 
E 100(100-1- 42.25) -150(121--2.25)4-250(144-1-12- 25) ] 
= 3 ( 100 X 142.25+150 x 123.254-250x 156.25 ) 
Я 
1 
= = ( 14225-+ 18487.50+39062.50 ) 
7177500 — 
TT SSG = 143.55 
> 0774/143.55 —11.98 
Example 6.21. Find the missing information from the follow- 
ing: 
Group I Group II Group IIT Combined 
Number 50 ? 90 200 
Standard Deviation 6 7 ? 7 746 
Mean 113 ? 115 116 


[Delhi Uni. B. Com. (Hons,) 1973] 
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Solution, 


Number 
s.d. 


Thus we have three u 
determine these three value 


given below : 


Now 


v yyt 


m=50 
$6 5-7 o=? 
4-13 Y=? 5115 


m=? 


nı tna +ng=200 


gamitn, tnt, 


mnn, 
‚ (m+n +n)? = "(912--42)--п,(а,24-4,2)--п,(2--0,2) (їй) 


From (i) we get : 


n-90 
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Combined 
Group 


Ta па 147—200 
o=7'746 
Х=116 


nknown values viz., no, o, and X, То 
5 we need three equations which are 


n,=200—(n, +-n,)=200—(50-+90)=60 (iv) 


Using (ii) we get : 


200 x 116=50 x 1134-60X,--90 x 115 


23900—5650 -4-60.Y,-1-10350 


60X,— 23200— (5650-I- 10350) 


=23200—16000=7200 


> _ 7200 


= 60 =120 


d,—X,—X—|13—116— —3 


d,—X,— Y—120—116—4 


d,—X,—X—115—116——1 
Substituting these values in (iii) we get : 


200x (7.746)? =50(36+-9)-+60(49-+ 16)+90(¢,?+1) 
200 60.000516—50x 45+60x 65+90+906,? 


12000 =2250-++3900+90+-900,? 


900,?=12000—6240=5760 


.. (v) 


m: 
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Hence the unknown constants are z 
n,=60, X¥,=120 and a,—8 


by: 
Coefficient of Standard Deviation ++-(6.32) 


We have already discussed the relative measures of dispersion 
based on Tange, quartile deviation and mean deviation. Since 
standard deviation їз by far the best measure of dispersion, for com- 
Paring the homogeneity or heterogeneity of two or more distribu- 
tions we generally compute the Coefficient of standard deviation 


100 times the coefficient of dispersion based on standard 
deviation is called the coefficient of variation, abbreviated as С.У, 
hus, 


c 
C.V.—100 x > e.. (6-33) 


According to Professor Karl Pearson who suggested this mea. 
sure, “coefficient of variation is the percentage variation in mean, 
standard deviation being considered аз the total variation in the 
mean", 


For comparing the variability of two distributions we compute 
the coefficient of variation for each distribution. A distribution 
with smaller C.V. is said to be more homogeneous or uniform or 
less variable than the other and the series with greater C.V. is said 
to be more heterogeneous or more varíable than the other. 


Remark. Some authors define coefficient of variation as the 
“coefficient of standard deviation expressed as a percentage". For 
example, if mean=15 and 5.d.—3, then 


8.1. 3 
C.V.= Men = 15 70.20 


> C.V.=20%. 
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Example 6.22. Comment on the following : 
For a set of 10 observations : 
mean=5, s.d.—2 and 0.V.=60%. 
.C.W.A. (Final) Dec , 1980] 
Solution. We are given : 
n=10, #=5 and o=2 
c 2 
C.V.—-i- EI —0 404095 
But we are given C.V.=60%. Hence the given statement is 
wrong. 
Example 6.23. If N=10, X—12, EX:—1530, find the coeffi- 
cient of variation. [Gujarat U. B. Com. Oct. 1975) 
Solution. We have: 


1а gp 1530 


10 


—(12)32153.—144—9 
> с=3 4 
[Negative sign is rejected since s.d. is always non-negative) 
: lOo0xc 100x3 
2 аар) 
> C.V.=25%. 
Example 6.24. The arithmetic mean of runs scored by three 
batsmen, Vijay, Subhash and Kumar in the same series of 10 innings 
are 50, 48 and 12 respectively. The standard deviation of their runs 


are respectively, 15, 12 and 2. Who is the most consistent of the three 7 
Tf one of the three is to be selected, who will be selected ? 


[Bombay Uni. B. Com. April 1978] 


25 


Solution. Let X, X, X, be the means and оу, og, v, the 
standard deviations of the runs scored by Vijay, Subhash and 
Kumar respectively. Then we are given : 


X,—50, X,—48, X,—12 


and 0,—15, o,—12, 03=2 
s... 100e, 100x15 . 
С.У. of runs scored by Vijay cy - = —% =30 
С.М. of runs scored by ——— =x oss 
2 


С.М. of runs scored by лу шыш =O? = 1667 
3 
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The decision regarding the selection of player may be based 
on two considerations : 


(i) If we want a consistent: player (which is a statistically 
sound decision), then Kumar is to be selected, since C.V. of the 
runs is smallest for Kumar. 


(ii) If we want to select a player whose expected score is the 
highest, then Vijay will be selected. 


А Example 6.25. For a distribution, the coefficient of variation 
is 22.5% and the value of arithmetic average is 7.5. Find out the 
value of standard deviation. 


(Delhi Uni. B. Com. 1983) 
Solution, We are given: C.V.—22:54, X=75 


100 X o C.V.xX 
aulcm; SS 100 
«= 255X75 _ 168.75 | cogs 


100 100 


Example 6.26. Coefficients of variation of two series are 75% 
and 90% and their standard deviations 15 and 18 respectively. Find 
their means. 


[Delhi Uni. B.A. (Econ. Hons.) 1982] 


Solution. We have : 


100хс z 100.6 

Vh оаа 

Mean of I Sef cow 
75 

Mean of II series =120%18 20 


Example 6.27. ‘‘Afier settlement the average weekly wage in a 
factory had increased from rupees 8 to 12 and the standard deviation 
had increased from rupee lio 1.5. After settlement the wage has 
become higher and more uniform." Comment, к 

[Delhi U. B. A. (Econ. Hons.) 1980] 


Solution. It is given that after settlement the average weekly 
wages of workers have gone up from Rs. 8 to Rs. 12. This implies 
that the total wages received per week by all the workers together 
have increased. However, we can not conclude the wage of each 
individual has increased. 
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Regarding uniformity of the wages we have to calculate the 
coefficient of variation of the wages of workers before the settlement 
and after the settlement. 


100x1 


С.У. of wages before the settlement 8 =12.5 
100 x3 
C.V. of wages after the settlement= 2x12 =12.5 


Since the coefficient of variation of wages before the settlement 
and after the settlement is same, there is no change in the variability 
of distribution of wages after the settlement, Hence itis wrong to 
say that the wages have become more uniform (less variable) after 
the settlement. 


Example 6:28. Two workers on the same job show the following 
results over a long period pf time : 


Worker A Worker B 
Mean time of completing 
the job (minutes) 30 25 
Standard dsviation (minutes) 6 4 


(i) Which worker appears to be more consistent in the time he 
requires to complete the job ? 


(it) Which worker appears іо be faster in completing the job? 
Explain, 

[Osmania U. B. Com. (Hons.) April 1983 ; 

Delhi Uni, B. Com. (Hons.) 1974] 


Solution, (i) We know 


в 
С.У.= ¥ x 100 j 
-. C.V. (for worker A)= 19056.50 
С.У. (for worker B)— 0x4 =16 


Since С.У. (В) is less than С.У. (A), the worker В appears to 
be more consistent in the time he requires to complete the job. 


КО) Since, Хв< Ул, i.e., on the average the worker B takes 
less time than worker A to complete the job, the worker B appears 
to be faster in completing the job. 
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5 Example 6:29, In two factories А and B engaged in the same 
industry in an area, the average weekly wages in Rs. and the standard 
deviations are as follows : 


Factory Average S.D. No. of Wage 
Weekly Wage Earners 
A 34.5 5.0 476 
В 28.5 4.5 524 


(a) Which factory A or B paya out larger amount as weekly 
wages ? 


(b) What is the average wages of all the workers in two factories 

taken together ? 

(c) What $s the coeficient of variation in case of each factory 
separately ? What inference do you draw froma comparison of two 
figures ? 

[Guru Nanak Dev U. B. Com. 1978 ; Delhi U. B. Com. (Hons.) 1977] 
_ Solution. Let m,n, denote the number of employees ; Xj, 
X, the average weekly wages of the employees in the faetories 4 and 
B respectively. Then we are given: 
m=476, — 3,—34)5, 0,=5.0 
n»,—524, — X,—928.5, ^ o,—45 
(a) We know that : 


Total wages paid by factory 


A vages = Тр 
verage wages of workers Nosofiworkers И 


> Total wages paid by factory=(Average wages of workers) 
X (No. of workers in it) 


Total weekly wages paid by factory A 
—n,Y,—476X34:5—Rs, 16,422 
Total wages paid by factory B 
=n, ¥,=524 X 28.5—Rs. 14,934 
Hence factory A pays out larger amount as weekly wages to 
its employees. 
(b) The average wage of all the workers in the two factories 
taken together is given by 
ут, 76x 3454524 x28:5 
nyitna 4764-524 


_ 16422414934 _ 31356 
1000 1000 
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(с) С.У. of the distribution of weekly wages for factory A is 
given by: 
10006,  100x5 


C.V. (A)= =14.49 
(A) X; 34.5 
Similarly, 
100g, 100x4.5 
С.У. (B) = == =15.79 
(B) Xe 28°5 


Since C.V.(B) is greater than C.V. (A), the factory B has 
greater variability in indlvidual wages. 


Example 6.30. An analysis of the monthly wages paid to 
workeva in two firms A and B, belonging to the same industry, gives 
the following results : 

Firm A Firm В 
Number of wages earners 550 650 
Average monthly wages Rs. 50 Rs. 45 
Standard deviation of the distribution of wages Rs. (4/90) Re (v 120) 


What are the measures of (i) avervge monthly wages and 

(it) standard demation in the distribution of individual wages of all 
workers in the two firms taken together ? 

[Z.0. W.A. (Intermediate) June 1977] 


Solution. Let nı, n, denote the sizes ; X,, X, the means and 
ву, 9, the standard deviations of the monthly wages (in Rs) of the 
workers in the firms A & B respectively. Then we are given : 

141—550, X,—50, 9, —4/90 => o,2=90 
1,7650, X,—45, 9,—4/190 +  c-190 


(i) The average monthly wage, say, X, of all the workers 
in the two firms A and B taken together is given by : 
gamm, 550x50+650% 45 
TIL 550-650 
= 27500-+29250__ 56750 


1200 = 1900. = Кз, 47.29 


(ii) The variance о? of the distribution of monthly wages of 
all the workers in the two firms A and B taken together is given by : 


(m+n)? —n;(o,2- dy?) +-n9(o,2-+-d,2) ..-(®) 
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Now: 
d,—X,—XY—50—4729-971 > dy? =7. 344 
d,= X,—X¥=45—47.299=—2.99 => d,2=5.244 

Substituting in (*) we get : 

1200c?=550(90+7-344)-+650(120+5.244) 
=550 X 97.3444-650 х 125.244 
=53539-2+81408.6=134947.8 


134947.8 
El. Em DU) 


> с= у 112.4565= Ез.10.60 


Example 6.31. Goals scored by two teams A and B in а foot. 


ball season were as follows : 
Number of matches 


= 112.4565 


Number of goals scored 
in a match 


By calculating the coefficient of variation in each case, find 
which team may be considered more Consistent, 
[Punjab Uni. B. Com. 1977 ; Bangalore Uni. В. Com, Oct. 1976] 


Solution. 
COMPUTATIONS FOR C.V. FOR TEAM ‘A’ 
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pc EA OSG 


Ха= > == =1.06 
ane (84) = 42 — (1.06): 


=2-83—1.1236=1.7064 
oa=/1-7064=1.3063 


: 1l00xc4 | 100 X 1.3063 
D Ve = ——— = .24 
С.У. for team А 3. 1.06 123. 


COMPUTATIONS FOR С.У. FOR TEAM “В” 


у- =48 
CAECUS 
ded 38 2р 
2 
п ЭЕ (Ж 125 (12и 
=315—144=1.7! 
= es VÀ 71=1.308 


2. C.V. for team В= 10092 _ 1001.308 _ 1995 
Хв 1.2 


Since the С.У. for team Bis les than the C.V. forteam A, 
team B may be considered to be more consistent. 


Example 6.32. From the prices of shares of X and Y given 
below, state which share is more stable in value, 


X: 55 54 52 58 56 58 52 50 51 49 
Y :108 107 105 105 106 107 104 103 104 101 


(Delhi U. B. Com. 1982) 
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Solution 


COMPUTATION OF MEAN AND S.D, OF SHARES 
X ANDY 


X  X-X-X—5 (х0 cy YoY 105 (у Ү) 


An ш ь—сооОр\о 


t^ 

oo 

N 
Фое оос —ь 


5 
| 
A 


Z(Y—-Y)—40 


ZX 530 „ XY 1050 
ya E 590. ..EY 1050. 
eco 73i PHP os 
POE ay Piu — 4/7 =2.646 
1 : dc o. 
cui yp. codo уш 
x | ТОМ es 
100% ae р 
см. (х) = PRS 100204609 
1 
су. r= О0о ЕЗ = gg 


Since C.V.(X) is less than C. V.(Y), the shares of X series are 
more stable in value, 


6.12. Relation Between Various Measures of Disper- 
sion. For a Normal distribution (c.f. chapter on theoretical distri- 
butions) we have the following relations between the different 
measures of dispersion : 

(i) Mean+Q D. covers 50% of the observation of the distri- 
bution. 

(ii) Меап+- М.Р”. covers 57:596 of the observations. 


362 Business Statistics 


(iii) Меап-Ес includes 68.27% of the observations. 
(iv) Meanz-2c includes 95.45% of the observations. 
(v) Mean-+3c includes 99.73% of the observations. 
(vi) Q.D.—0.6745o&$6 (approximately) 


(vit) M.D.= | i . а=0.7979я+%в (approximately) 


(viii) Q.D.=0:8459 M.D. [From (vi) and (vii)] 
= Q.D.=§ M.D. (approximately). 
Combining the results (vi), (vis), and (viii) we get approxima- 
tely : 
3 Q.D.—2 S.D. ; 5 M.D.=4 S.D. ; 6 Q.D.—5 M.D. 
> 4 S.D.—5 M.D.—6 Q.D. 


Thus we see that standard deviation ensures the highest degree 
of reliability and Q.D. the lowest. 


(iz) We have : 
Q.D. : M.D. : S.D. : : ĝo : $0 : o 
> Q.D. : M.D. : S.D. : : 10 : 12 : 15, 
(х) Range=6 S.D.=60 
Remarks 1. Rigorously speaking, the above results for vari- 
ous measures ot dispersion hold for Normal distribution discussed 
in chapter on theoretical distributions. However, these results are 


approximately true even for asymmetrical distributions or modera- 
tely asymmetrical (skewed) distributions. 


2. In the above results we have expressed various measures 
of dispersion in terms of standard deviation, We give below the 


relations expressing standard deviation in terms of other measures 
of dispersion. 


S.D.=1.2533 M.D.=ł мр, 

S.D.—1.4826 Q.D.=3 Q.D. 

S.D.= 1/6 Range 

Also we have 

M.D.=1.1830 Q.D.=$ Q.D. 
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EXERCISE 6.4 


М 1. (а) What do you understand by absolute and relative of 
dispersion ? Explain advantages of the relative measures IRE ND 
measures of dispersion. (Bombay U. B. Com. May 1980) 


(b) What do you underst. i iati 
donate you understand by coefficient of variation ? What Purpose 


2. The arithmetic mean of runs secured by the three batsmen, X, Y and 
Z in a series of 10 innings are 50, 48 and 12 respectively. The standard-deviation 
of their runs are 15, 12 and 2 respectively. Who is the most consistent of the 
three ? (Osmania Uni. B. Com. Oct. 1983) 


3. Two samples A and B have the same standard deviations, but the 
mean of A is greater than that of B. The coefficient of variation of A is : 
(i) greater than that of B; (ii) less than that of B ; 


(iii) equal to that of B ; (iv) none of these. 
[С.А. (Intermediate)(N.S.) Nov. 1982) 


Ans. (i) 
4. The coefficient of variation and standard deviation of the series are 
58% and 22 respectively. What is the arithmetic mean of the series ? 
[Punjab U. B. A. (Econ. Hons.) 1978) 


Ans. Mean=37.93 


5. The coefficient of variation of a distribution is 60% and its standaYd 
deviation is 12. Find out its mean, 


(Delhi U. B. Com. (External) 1982) 
Ans. Mean=20 
6. Coefficient of variations of two series are 6095 and 8095. Their 
standard deviations are 20 and 16. What are their arithmetic means ? 
[Delhi Uni. B. Com. (Hons.) 1970] 
Ans. 33:3, 20 


.7. Comment on the statement “After settlement the average weekly 
wage in a factory had increased from Rs. 8 to Rs. 12 and the standard deviation 
had increased from 2 to 2'5. After settlement, the wages has become higher and 
more uniform. [Delhi Uni. B A. Econ. (Hons.) 1972] 


Ans, Yes. 

8. A study of examination results of a batch of students showed the 
average marks secured as 50 with a standard deviation of 2in the first year of 
their studies. The same batch showed an average of 60 marks with an increased 
standard deviation of 3, after five years of studies. Can you say that the batch 
as a whole showed improved performance ? 

[Delhi Uni. B.A. (Econ. Hons.) 1979) 


Ans. Improved performance (better average) and more consistent. 


9. The means and standard deviations of two brands of light bulbs are 


Biven below : 
Brand 1 Brand 2 


Mean 800 hours 770 hours 
S.D. 100 hours 60 hours 
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Calculate a measure of relative dispersions for the two brands and inter- 
те [Delhi Uni. B A. (Econ. Hons. I) (NS), 1983] 
Ans. С.У. (D) 12:5; СУ. (I)—7:79 ; Brand I is more uniform. 


10. The number of employees, wages per employee and the variance of 
the wage per employee for two factories are given below : 


Factory A Factory B. 
No. of employees 50 100 
Average wages per employee per 
month (in Rs.) 120 85 
Variance of the wages per employee 8 43 


per month (in Rs.) 


In which factory isthere greater variation in the distribution of wages 


r employee ? 
us cd [Delhi Uni, В. Com. (Hons.) 1982 


Ans. In factory В; (С.У. (A) 25; C.V.(B) 247) 


11. An analysis of monthly wages of workers of two organisations C 
and D yielded the following results : 


Organisation 

С р 
No. of workers 50 60 
Average monthly wages Rs, 60 Rs, 48 
Variance 100 144 


Obtain the average monthly wages and standard deviation of wages of 
all workers in the two organisations taken together. Which organisation is 
more equitable in regard to wages ? [Delhi U. B.A, (Econ. Hons.) 1981] 


Ans. AXu-Rs. 53°45, oj4 Rs. 12°64; Organisation С is more equitable 
in regard to wages. 


12. The mean and stansard deviation of 200 items are found to be 60 
and 20 respectively. At the time of calculations, two items were wrongly taken 
as 3 and 67 instead of 13 and 17. Find the correct mean and standard deviation. 
What is the correct coefficient of variation ? 


[Delhi U. B.A. (Econ. Hons.) 1981], 
Ans. Corrected mean- 59:8, s.d—20 09 ; C. V.—33:60 


13. Find the coefficient of variation for the following data. 
Size (in cms,): 35—40 30—35 25—30 20—25  15—20  10—15 
No. of items ; 15 20 35 20 8 2 


(Delhi U. B. Com. 1979) 
Ans. 21°90 
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PA 14. Calculate standard deviation and its coefficient from the following 
Age under : US. ag 40 50 6 70 10 
No. of persons 
dying : 15 30 53 75 100 110 115 115 


[Punjab U. В.А. (Econ. Hons, II) April 1983 
Ans. S.D.—16:56 ; C.V.—52.25 ts : 


15. Calculate coefücient of variation from the following datà : 


Income (Rs ) No. of Income (Rs.) No. of 
families families 
Less than 700 12 Lessthan 1000 75 
» 800 30 » 1100 110 
» 900 50 з 1200 120 


Ans. 16:12 (Himachal Pradesh U. B, Com, April 1982) 


16, (a) Distinguish between mean deviation and standard deviation. 
(b) Calculate coefficient of standard deviation from the following data. 


Marks No. of Marks No. of 
Students Students 
Morethan 0 100 More than 40 25 
З 10 90 А 50 15 
dus 20 75 Ms 60 5 
" 30 50 3 70 0 


[Punjab U. B.A. (Econ. Hons.) 1980 ; Kurukshetra U. B. Com. Sept. 1980] 
Ans, 51419, CV=51°419 


17. The following is the record of goals scored by team А in the foot- 
ball season : s 


No. of goals scored by 
team Aina match ` : 0 1 2 3 4 


Number of matches — : 1 9 7 5 3 
For team B the average number of goals scored Per match was 2-5 with 
à standard deviation of 1:25 goals. 


Find which team may be considered more consistent, 
(Guru Nanak Dev. U. B, Com, 1981) 


Ans, (C.V.) —54-77 ; C.V. (B)=50 ; B is more consistent. 
18. From the prices of the shares A and B below, state which is stable 


in value. 
A : 110 108 104 106 112 116 104 100 102 98 


B : 216 214 210 210 212 214 208 206 208 202 | 
(Nagarjuna U. B. Com. Oct. 1981) 


Ans. С.У. (A)—4-99 ; C. V. (B)—190; Share B is more consistent in 


19. The runs scored by two batsmen A & Bin ten innings are a 


follows : ; 
By.4 :— I0. 11$ 5.5 -- 73 7. 120 .36 ^84. 29 . 19 


By В : ' 45 12 76 42 4 307 1372714857713 0 


Who is better run getter? Who is more consistent, 
| 3 а + [Delhi U. B. Com. (Hons.) 1980) 


price. 
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Ans. А 49:8 ; C.V. (A)=84°92 ; B=32-7; C. V. (B)=70'89. 
Player A is better run getter and player B is more consistent. 


20. Find the standard deviations in the following figures to show 
whether the variation is greater in the yield or Area of the field. 


Years Area in Yield in Years Area in Yield in 
lakh acres — lakhbales ~ lakh acres lakh bates 
1914-15 152 49 1919-20 153 43 
1915-16 114 51 1920-21 144 59 
1916-17 138 50 1921-22 117 60 
1917-18 154 45 1922-23 136 63 
1918-19 144 40 1923-24 154 60 


(Himachal Pradesh Uni. В. Com. April 1982) 


. Ans. С.М, (Area) —9:95 ; C.V.(Yield) —14:72; Variation is greater in 
the yield. 


21. Compilea table, showing the frequencies with which words of 
different number of letters occur in the extract reproduced below (omitting 
punctuation marks) treating as the variable the number of letters in each word, 
and obtain the mean, median and the co-efficient of variation of the distri- 
bution : 

“Success in the examination confers no absolute right to appoinment 
unless Government is satisfied, after such enquiry as may be considered neces- 
ED that the candidate is suitable in all respects for appointment to the public 
service”, 


Ans, Mean=5'5 ; Median=5 ; s.d.—3:12; C.V.=56'7 

22. Prepare a frequency distribution of the number of letters in a word 
from the following excerpt (ignore punctuation marks) and obtain the mean, 
s.d. and coefficient of variation. 

“In the beginning”, said a Persian Poet, “Allah took a rose, a lilly, a 
dove, a serpent, a little honey, a Dead Sea Apple and a handful of clay, When 
he looked at the amalgam—it was a woman”, 


Ans. Mean=3'56, с=2:0976, C.V.—58:85 
23. From the data given below, state which team (A or B) is more con 


sistent : 

No. of goals scored No. of Matches 

in a match A B 

0 27 1 

1 9 5 

2 8 8 

3 5 9 

4 1 27 


(Kurukshetra U. B. Com. II, April 1982) 
Ans. СУ. (A)=127-84 ; C.V.(B)—-36:06; Team B is more consistent. 
24. Following is the table giving weight of the students of two classes, 


‘alculate the coefficient of variation of the two distributioas. Which series is 
vore variable ? 


"Weight in kgs Class A Class В 
20—30 7 5 
30—40 10 9 
40—50 20 21 
50—60 18 15 
60—70 7 6 
j Total 62 56 


(Kurukshetra U. B, Com. 1981) 
Ans, C.V.(À) -28:905 ; C.V.(B) 72355; Distt. in class А is more variable. 
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25. A purchasing agent obtained sam; 
: ples of incandescent lamps from 
> ee ару aad B. He had the samples tested for length of life with the 


Length ot life жорт Supplier 
500 and under 700 10 3 
700 , „ 900 16 42 
900 , „1,100 30 12 

13000 ,  ,,1,300 8 4 


Which supplier's lamps are more uniform as regards their length life ? 
(Delhi U. B. Com. (Hons.) 1979) 


Ans. X,—912.5, са=179.8, C.V.(A)—19.7 ; Y,—835.74, съ=131-23, 
EAD Supplier B's lampsare more uniform аз regards their length 


26. Samples of polythene bags from two manufacturers, А апа В are 
me by a prospective buyer for bursting pressure and the results are as 
follows : 


No. of Bags 
Bursting Pressure (Ib) A B 
50— 9:9 2 9 
10:0—14:9 9 11 
15:0—19:9 29 18 
20 0—24:9 54 32 
25:0—29'9 11 27 
30:0—34:9 5 13 


Which set of bags has more uniform pressure? If prices are the same 
which manufacturer's bags would be prefered by the buyer ? Why? 
[Delhi U. B. Com. (Hons.) 1981] 


Ans, X,—21; 04-4878, C.V.(A)=23'23; Y,—2181, e4—70775; 
C.V.(B)&32:44 
27. Two brands of tyres are tested with thefollowing results : 


Life Number of Tyres Life Number of Tyres 
(Thousand of Brand X Brand Y (Thousand of Brand X Brand Y 
miles) miles) 
20—25 1 0 33—34 13 2 
25—27'5 7 4 34—35 9 0 
27'5—30 15 20 35—37°5 8 0 
30—31 10 32 37:5—40 2 0 
31—32 15 30 40—45 3 0 
32—33 17 12 
Total 100 100 


(a) Draw a histogram for each frequency distribution. 

(b) Which brand of tyres would you use on your fleet oftrucks, and 
why ? 
(c) If the law forbids truck tyres to be used for more than 30,000 miles, 


how does that change your answer, if at all ? 
[Delhi Uni. B. Com, (Hons.) 1971] 


- 
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Hint, (а) Adjust the frequencies taking each class interval as unity 
опе). ; 
j (b) First of all convevt the given distribution into contínuous frequency 
distribution of equal intervals, each of magnitude 5 viz; 20—25, 25—30, ..., 
40—45, for each braad ef tyres, 


Now compute C.V. (e. f. $ 6'11) for each brand X and Y of tyres, The 
brand of tyres with lesser C.V. will be recommended for use on the fleet of 
trucks. к 

(с) If the law forbids truck tyres to be used for more than 30,000 miles, 
then under the circumstances we would recommend the tyre with greater average 
life. 


Ans. (b) Brand Y to be used, (c) Brand X to be used. 


28. For a group of 50 male workers, the mean and standard deviation 
of their weekly TH are Rs. 63 and Rs. 9 respectively. For a group of 40 female 
workers these are Rs. 


Ans. Ста. 


29. The first of the two samples has 100 items with mean 15 and standard 
deviation 3 H the whole group has 250 items with mean 15°6 and standard 
deviation 4/13.44, find the standard deviation of the second group. 

U.C. A. (Intermediate) Dec. 1982 ; Madras Uni. B. Com, Nov. 1978 d 
Punjabi U. M.A. Econ. 4977] 
Ans. X,—16; o,—4. 
30. Given the following : 


Sample number Mean Variance Sample size 
1 ¥,=60 с12=9 m=100 
2 ¥,=40 с,2—4 т=50 


State the formula for their combined variance and find the same. 


(Bombay U. В. Com. April, 1979) 
Ans. 96:22. 


31. А company has three establishment Еу, Ey, E, in three cities. Analy- 


sis of the monhly salaries paid to the employees in the three establishments is 
given below : 


E, E, E, 
Number of employees 20 25 40 
Average monthly salary (Rs.) 305 300 340 
Standard deviation of 
monthly salary (Rs.) 50 40 45 
Find the average and the standard deviation of the monthly salaries of 
all the 85 employees in the Company. [LC.W.A, (Intermediate) Dec, 1981] 


Алз. Меап=Ёз. 320; s.d.=Rs, 48°69. 


32. An analysis of monthly wages paid to the workers in two firms 4 
and B belonging to the Same industry gives the following results : 


Firm 4 Firm B 
Number of workers 500 600 
Average monthly wage Rs. 18600 Rs, 17500 
Variance of distribution of wages 81 100 


(i) Which firm, A or B, has a larger wage bill ? 


o In which firm, A or B, is there greater variability in individual 


wages (Bangalore Uni. B. Com. May 1979) 
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о (dii) Calculate (a) the average monthly wage, (Б) the variance of the 
distribution of wages, of all the workers in the firm A and B taken together. 
(Madurai U. B. Com. 1978) 
Ans, (i) Firm B has larger wage bill. 
(ii) Firm B has greater variability in individual wages. 
(if) Y—Rs. 180; c*—121.36. 
Е 33. If the mean deviation ofa moderately skewed uistribution is 7:2 
units, find the standard deviation as well as quartile deviation. 
[Delhi Uni. B.A. Econ, (Hons.), 1978] 


5 5 
Ans. S.D.2- М.р.=6:5; Q.D. c M.D.=6°0. 


6.14. Lorenz Curve. Lorenz curve isa gtaphic method of 
studying the dispersion in a distribution. It was first used by Max. 
O. Lorenz, an economic statistician for the measurement of econo- 
mic inequaiities such as in the distribution of income and wealth 
between different countries or between different periods of time. 
But today, Lorenz curve is also used in business to study the dis- 
Parities of the distribution of wages, profits, turnover, production, 
population etc. 


E A very distinctive feature of the Lorenz curve consists in deal- 
ing with the cumulative values of the variable and the cumulative 
Tequencies rather than its absolute values and the given frequencies. 
The technique of drawing the curve is fairly simple and consists of 
the following steps : 


(i) The size of the item (variable value) and the frequencies 
are both cumulated. Taking grand total for each as 100, express 
these cumulated totals for the variable and the frequencies as per 
centages of their corresponding grand totals. 


(ii) Now take coordinate axes, X-axis representing the per- 
centages of the cumulated frequencies (x) and Y-axis representing 
the percentages of the cumulated values of the variable (y). Both 
x and y take the values from 0 to 100 as shown in the diagram on 
page 444. 


(iii) Draw the diagonal line y—x. joining the origin O (0,0) 
with the point P (100, 100) as shown in the diagram. The line OP 
will make an angle of 45° with the X-axis and is called the line of 
equal distribution. 


(iv) Plot the percentages of the cumulated values of the vari- 
able (y) against the percentages of the corresponding cumulated 
frequencies (x) for the given distribution and join these points with 
a smooth free hand curve. Obviously, for any given distribution 
this curve will never cross the line of equal disttibution OP. It will 
always lie below OP unless the distribution is uniform (equal) in 
which case it will coincide with OP. 

Thus when the distribution of items is not proportionately 
equal, the variability (dispersion) is indicated and the curve is farther 
from the line of equal distribution OP. The greater the variability 
the greater is the-distance of the curve from OP. 
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Let us consider the following Lorenz curve (diagram on page 
444) for the distribution of income, (say). 

In the diagram, OP is the line of equal distribution of in- 
come. If the plotted cumulative percentages lie on this line, there 
is no variability in the distribution of income of persons. The points 


Y LORENZ CURVE 


P (100, 100) 


100 


INCOME —— 


о ‘100 
NO. OF PERSONS = 


Fig. 6.1. 


lying on the curve OAP indicate a less degree of variability as com- 
pared to the points lying on the curve OBP. Variability is still 
greater, when the points lie on the curve OCP. Thus a measure of 
variability of the distribution is provided by the distance of the 
curve of the cumulated percentages of the given distribution from 
the line of equal distribution. 

Remarks 1. An obvious disadvantage of the Lorenz curve is 
that it gives us only a relative idea of the dispersion as compared 
with the line of equal distribution. It does not provide us any 
numerical value of the variability for the given distribution. Accord- 
ingly it should be used together with some numerical measure of 
dispersion, However, this should not udermine the utility of Lorenz 
curve in studying the variability of the distributions particularly 
relating to income, wealth, wages, profits, lands, and capitals etc. 


2. From the Lorenz curve we can immediately find out as 
to what percentage of persons (frequencies) correspond to a given 
percentage of the item (variable value). 


Example 6.33. From the following table giving data regarding 
income of workers in a factory, draw a graph (Lorenz curve) to study 
the inequality of income : 


e 
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Income (in Rs.) No. of workers in the factory 
Below 500 6,000 
500—1,000 4,250 
1,000—2,000 3,600 
2,000—3,000 1,500 
3,000—4,000 650 
Solution. 


CALCULATIONS FOR LORENZ CURVE 


Income Mid- Cumu- Percen- No. of Cumu- Percen- 
(in Rs.) Value lative tage of cumu- Workers lative tage of 
income lative income (Р) frequency сити- 
lative 
fregnency 
[0] (2) [9] (4) e (6) ”) 
0— 500 250 250 2:94 6,000 6,000 37:5 
500 —1000 750 1,000 11:76 4,250 10,250 641 
1000—2000 1500 2,500 29:41 3,600 13,850 86:6 
2000—3000 2500 5,000 58-82 1,500 15,350 95:9 
3000—4000 3500 8,500 100°00 650 16,000 100-0 
——M—————————M— 
Total 8500 16,000 


LORENZ CURVE 


© 
© 


80 


PERCENTAGE OF INCOME 


0 100 
10 20 30 40 50 60 70 80 9 
PERCENTAGE OF PERSONS —* 


Fig. 6.2. 
e prominently exhibits the inequality of the 
dam аве pel ote the factory workers. 
Remark. From the lorenz curve we observe that 70% of ше 
persons get only 15% of income and 90% of the persons get only 
35% of the income. 2 5 
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EXERCISE 6.5 


1. What is Lorenze curve ? How do you construct it ? What is its use ? 


2. From the following table 
in two factories draw a graph (Loren: 
inequalities of income : 


giving data regarding income of employees 
2 Curve) to show which factory has greater 


Income Factory A Factory B 
(Rs.) 
Below 200 7,000 800 
200— 500 1,000 1,200 
500—1,000 1,200 1,500 
1,000— 2,000 800 400 
2,000— 3,000 500 200 


3. The frequency distribution of marks obtained in Mathematics (M) 
and English (E) areas follows: . 


Mid-valueof marks 5 15 25 35 45 55 65 75 85 95 
No. of students (M) 10 12 13 14 22 27 20 12 п 9 
No. of students (E) 1- 2 26 50 $59. 40- 10 8 3 


Analyse the data by drawing the Lorenz 
and describe the main features you observe. 


4. Draw Lorenz Curve for the comparison of profits of two groups of 
comyanies, A and B, in business, What is your conclusion ? 


I 
curves on the same diagram 


Total Amount of profits Number of Companies in 
earned by companies 


[Punjab Uni. B.A. (Econ, Hons.) 1978] 
5. (a) Write an explanatory note on Lorenz curve, 


Я (Б) The following table gives the population and earnings of residents 
in towns A and В. Represent the data graphically so ne 


Н Аксын, Я as to bring out the in- 
equality of the distribution of the earnings of residents, 


Ме үч 


"чүлүктү жү 


— M TNT Y 


| 


CQ ECC 
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Town A Town B 

Persons Earnings (Daily) Persons Earnings (Daily) 
50 35 100 160 
50 50 140 320 
50 75 60 120 
50 115 50 280 
50 160 200 400 
50 180 90 400 
50 225 60 280 
50 300 40 920 
50 425 160 240 
50 925 100 960 
“500 2500 - 1000 4000 


Ans. Inequality of incomes is more prominent in Town A. 


EXERCISE 6.6 
(Objective Type Questions) 


1. Match the correct parts to make a valid statement : 


(а) Algebraic sum of deviations 
from mean G) Q@-Q 


1000 
Mean 


(c) Variance (iii) Zero 
(d) Quartile Deviation (iv) + Bf |X-X| 


(b) Coefficient of Mean Deviation (ii) 


(е) Coefficient of Variation (9 + Ef (X-X 
(f) Sum of absolute deviations (vi) M.D. about Mean 
from median Mean 
\ (g) interquartile range (vii) (Q3—Q,)/2 
(A) Mean deviation (viii) Minimum 


Ans. i; (b ; ; ii); ii) ; 
UI 2000: аео OO TOS, Cela 


П. In the following questions, tick the correct answer : 
(i) Algebraic sum of deviations from mean is : 
(a) Positive, (5) Negative, (c) Zero, (d) Different for each case. 
(ii) Sum of squares of deviations is minimum when taken from : 
(a) Mean, (b) Median, (c) Mode, (d) None of these. 


(iii) Sum of absolute deviations is minimum when measured from : 
(a) Mean, (b) Median (c) Mode (d) None of these. 
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(iv) For a discrete frequency distribution : 
@ S.D. M.D., (6) S,D.ZM.D., (c) S.D.>M.D., (d) S.D, M.D. 1 
(е) None of these, where M.D.-- Mean Deviation from mean. 
(v) The range of a given distribution is 


(a) Greater than s.d., (b) Less than s.d., (c) Equal to s.d., (d) None 
of these. 
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(vi) The measure of dispersion independent of frequencies of the given 
distribution is ў 


T 
(a) Range, (b) s.d., (c) M.D., (d) Q.D. ] 


(vii) In case of open end classes, an appropriate measure of dispersion 
to be used is 


(a) Range, (a) Q.D., (с) M.D., (d) s.d. 


(viii) Measure of dispersion which is affected most by extreme observa- 
tions is. 


(a) Range, (b) Q.D., (c) M.D., (4) s.d, 
(ix) Mean devivation from median (Md) is given by. 


— X—Md 
(а) сыр! (b 2 M 1 


| 
(с) [мат (2 Mat 


(x) Quartile deviation is given by: 


O 9-0, 0 BER, у LW (gy 9-0. 


(xi) Step deviation formula for variance is: 
3d? Zd s zd? ха ү ZdX* ха 
o xu) oi. E) (22) = 


в (22) (му j 


(xii) If the distribution is apProrximately normal, then, p 
@ M.D.-2., Q) MD- 3, (с) M.D.=4 в (d) None of 
these. 
(xiii) For a normal distribution, 
@ Wate (0 QD.-2 в, (с) Q.D.=c, (d) None of 
these. 


(xiv) For a normal distribution, 
(а) Q.D.>M.D,, (6) Q.D.-M.D. (c) 0.р.=м.р. 

(xv) For a normal distribution the range Mean+1.¢ covers 
(а) 65%, (b) 68-26%, (c) 85%, (4) 95% of the items, 

Ans. (i) = (с); (ii) > (а); (йуз (b; (+ 0; ™> (а); 
(i) + (а); (уй) > (Б); (viii) -(a); (x) >(a); (х) > (4); + 
Gi) > (b) ; (xii) > (с); (xiii) > @); (xiv) (b) ; (xv) (b) 

III. Fill in the blanks : 

(i) Algebraic sum of deviations is zero from 
(ii) The sum of absolute deviations is minimum from...... 
(iii) Standard deviation is always ... than range 
(iv) Standard deviation is always .. 


- than mean deviation 
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(у) АП relative measures of dispersion are...... from units of measure- 
ment. 
(vi) Variance is the ... value of mcan square deviation. 
(vii) If О,=10, О, =40, the coefficient of quartile deviation 18... 
(viii) I£ 2595 of the items ina distribution are less than 10 and 25% are 
more than 40, quartile deviation is ... 
dx) The median and s.d. of a distribution are 15 and 5 respectively. If 
each item is increased by 5, the new median= ... and s.d.—... 
(х) A computer showed that the s.d. of 40 observations ranging from 
120 to 150 is 35. The answer is correct/wrong. Tick right one. 
Ans. (i) Arithmetic mean (ii) Median (iii) Less (iv) Greater (v) Free 
(vi) Minimum (vii) 0*6 (viii) 0*6 (ix) 20,5 (x)Wrong, since s.d. can't 
exceed range. 
TV. Fill in the blanks : 
(i) The......the Lorenz Curve is from the line of equal distribution, the 
g 


reater is the variability in the series. 


(ii) Quartile deviation is......measure of dispersion.. 
(iii) If Q.=20 and Q,—50, the coefficient of quartile deviation is....... 
(iv) Jf in a series, coefficient of variation is 64 and mean 10, the standard 
deviation shall be....... 
Ans. (i) Farther (ii) Absolute (iii) 2:33 (iv) 64 
V. State whether the following statements are true or false. In case of 
false statements, give the correct statement. 
(i) Algebraic sum of deviations from mean is minimum. 
(ii) Mean deviation is least when calculated from median. 
(iii) Variance is always non-negative. 
(iv) Mean, standard deviation and coefficient of variation have same 


units. 
(у) Relative measures of dispersion are indepenndent of units of mea- 


surement. 
(vi) Mean and standard deviation are independent of change of origin. 
(vii) Variance is square of standard deviation. 
(viii) Standard deviation is independent of change of origin and scale. 
(ix) Variance is the minimum value of mean square deviation. 


(x) QD.- 4 x (s.d.), always. 
(xi) Mean deviation can never be negative. 


2 
(xii) M.D.— ў X с, for normal distribution- 


(xiii) If mean and s.d. of a distribution are20 and4 respectivety, 
C.V.=15%. 
(xiv) If each value in a distribution of 5 obscrvations is 10, then its mean 
is 10 and variance is 1. 
(xv) In a discrete distribution s.d.2M.D. (about mean). 
Ans. (i) False, (ii) True, (iii) True, (iv) False, (v) True, (vi) False, 
(vii) True, (viii) True, (ix) True, (x) False, (xi) True, (xii) False, 
(xiii) False, (xiv) False, (ху) True. 
VI. The mean and s.d. of 100 observations are 50 and 10 respectively. 
Find the new mean and standard deviation, 
(t) if 2 is added to each observation, 
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(01) if 3 is subtracted from each observation, ! 
(iii) if each observation is multiplied by 5, CN 
(v) if 2is substracted from each observation and then it is divided 
by 5. 
Ans, (i) 52,10, (ii) 47,10, (iii) 250,50 (iy) 96,2 
VII. The sum of squares of deviations of 15 observations from their 
mean 20 is 240, Find (i) s.d. and (ii) С.У. 1 
Ans. o=4, c.y.—20. | 
VIII. State, giving reasons, whether the following statements are true or 
` false. 
(1) Standard deviation can never be negative. 
ii) S f squares of deviations measured from mean is least. 
a ee (Delhi U. B.Com. 1983) 
Ans. (i) True, (ii) True, 
IX. Comment briefly on the following statements : 


(i) The mean of the combined series lies between the means of the two 
component series, 


(ii) The standard deviation of the combined series lies between the stan- 
dard deviations of the two component series. 


(iii) Mean can never be equal to standard deviation, 
(iv) Mean can never be equal to variance, 
(v) A consistent cricket player has greater variability in test scores,» 
Ans. (i) True (ii) False (iii) False (iy) False (v) False. 
X. Comment briefly on the following statements : 
(i) The median is the point about which the sum of Squared deviations 
is minimum 
(ii) Since x(X,— Ẹ)=0, .-. 2(X;—X)*=0 
^ (ii) A computer obtained the standard deviation of 25 observations 
whose yalues ranged from 65 to 85 as 25. 


(iv) A student obtained the mean and variance of a set of 10 observations 
as 10,—5 respectively. 


(v) The range is the perfect measure of variability as it includes all the 
measurements, 


(vi) For the distribution of 5 observations : 
8, 8, 8, 8, 8, 
mean= 8 and variance=8 
(vii) If the mean and s.d. of distribution A are smaller than the mean and 
s.d. of distribution B Tespectively, then the distribution A is More 
uniform (less variable) than the distribution B, 
Ans. (i) False, (ii) False, (iii) False, (iv) False, (v) False, (vi) False 
(vii) False... 
XL (а) If s.d. ofa group is 15, find the most likely value of (i). Mean 
deviation and (ii) Quartile deviation of that group, usu 
(5) If mean. deviation of a distribution is20 find the most likely 
value of (2) s.d. and (ii) Q.D. 
(c) If quartile deviation of a distribution is 6, find the most likely 
value of (i) s.d, and (ii) M.D, 
(4) If апае deviation of a distribution is 20 and its mean is 60, 
obtain the most likely value of (i) Coefficient cf variation, 
(ii) Mean deviation and iii) coefficient of mean deviation. 
In aff the above parts. assume that the distribution is norma]. ў 
Ans. (а) M.D =12, Q.D.—10, (b) в.й.=25, Q.D.—16:67, (c) s.d.=9, kA 
M.D.=7.2. (d) C.V.—50, M.D.=24, Coefficient of M.D.=0-4, 
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Skewness and Kurtosis 


7.1. Introducion. It was pointed out in the last two chapters 
that we need statistical measures which will reveal clearly the salient’ 
features of a frequency distribution. The measures of central 
tendency tell us about the concentration of the observations about 
the middle of the distribution and the measures of dispersion give 
us an idea about the spread or scatter of the observations about 
some measure of central tendency. We may соте across frequency 
distributions which differ very widely in their nature and composi- 
tion and yet may have the same central tendency and dispersion. 
For example, the following two frequency distributions have the 
same mean X= 15 and standard deviation c=6, yet they give histo- 
grams which differ very widely in shape and size. 


Frequency distribution I Frequency distribution II 


Frequency Class. Frequency 


Thus these two measures viz., central tendency and dispersion 
are inadequate to characterise a distribution completely and they 
must be supported and snpplemented by two more measures viz., | 
skewness and kurtosis which we shall discuss in the following sec- 
tions. Skewness helps us to study the shape i.e., symmetry or 
asymmetry of the distribution while kurstosis refers to the flatness 
or peakedness of the curve which can be drawn with the help of the 
given data. These four measures viz., central tendency, dispersion, 
skewness and kurtosis are sufficient to describe a frequency distri- 
bution completely. 


7.2. Skewness. Literal meaning of skewness is ‘lack of sym- 
meiry’. We study skewness to have an idea about the shape of the 
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curve which we can draw with the help of the given frequency 
distribution. It helps us to determine the nature and extent of the 
Concentration of the observations towards the higher or lower values 
of the variable. In a symmetrical frequency distribution which is 
unimodal, if the frequency curve or histogram is folded about the 
ordinate at the mean, the two halves so obtained will coincide with 
each other. In other words, in a symmetrical distribution equal 
distances on either side of the central value will have same frequen- 
cies and consequently both the tails, (left and tight), of the curve 
would also be equal in shape and length. A distribution is said to 
be skewed if : 


(i) The frequency curve of the distribution is not a symmetric 
bell shaped curve but it is stretched more to one side than to the 
other. In other words, it has a longer tail to one side (left or right) 
than to the other. A frequency distribution for which the curve has 
a longer tail towards the right is said to be positively skewed and if 
the longer tail lies towards the left, it is said to be negatively skewed. 


SYMMETRICAL DISTRIBUTION 


POSITIVELY SKEWED DISTRIBUTION NEGATIVELY SKEWED CISTRIBUTION 


M Mg Mo 
Fig. 7.2. 


(ii) The values of Mean (M , median (Md) and 
fall at different points i.e., they ad a senes BM MO 


(iii) Quartiles Q, and Q; are not equidistant from the median 


Q.—Md4Md—-Q, 


ie., 
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Remark. Since the extreme values give longer tails, a positively 
skewed distribution will have greater variation towards the higher 
values of the variable and a negatively skewed distribution will 
have greater variation towards the lower values of the variable. For 
example the distribution of mortality (death) rates w.r.t. the age 
after ignoring the accidental deaths will give a positively skewed 
distribution. However, most of the phenomena relating to business 
and economic statistics give rise to negatively skewed distribution. 
For instance, the distributions of the quantity demanded w.r.t. the 
price; or the number of dipositors w.r.t. savings in a bank or the 
number of persons w.r.t. their incomes or wages in a city will give 
negatively skewed curves. 


721, Measures of Skewness. Various measures of skewness 
(Sk) ате: 


(1) Sk=Mean—Median=M— Md (7.1) 
or Sk—Mean—Mode— M— Мо «(7.1 a) 
(2) Sk=(Q,;—Md)—(Md—@;) 

=0,+0,—2 Md (12) 


These are the absolute measures of sewness and are not of 
much practical utility because of the following reasons : 

(i) Since the absolute measures of skewness involve the units 
of measurement, they cannot be used for comparative study of the 
two distributions measured in different units of measurement. 

(ii) Even if the distributions are having the same units of 
measurement, the absolute measures are not recommended because 
we may come across different distributions which have more or less 
identical skewness {absolute measures) but which vary widely in 
the measures of central tendency and dispersion. 

Thus for comparing two or more distributions for skewness 
we compute the relative measures of skewness, also commonly 
known as coefficients of skewness which are pure numbers indepen- 
dent of the units of measurement. Moreover, їп а relative measure 
of skewness, the disturbing factor of variation or dispersion is elimi- 
nated by dividing the absolute measure of skewness by a suitable 
measure of dispersion. The following are the coefficients of skew- 
ness which are commonly used : 


1. Karl Pearson's Coefficient of Skewness. This is given by 
the formula : 


Mean—Mode _ M—Mo_ AT. 
mae eee ga 


But quite often, mode is ill-defined and is thus quite difficult 
to locate. In such a situation, we use the following empirical 
relationship between the mean, median and mode for a moderately 
asymmetrical (skewed) distribution : 


Mo=3Md—2M (1.4) 
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Substituting in (7.3) we get : 
Sk— M—(3Md—2M) _3(M— Md) (7.5) 


с с 


Remarks 1. Theoretically, skewness given Бу (7.5) or (7.3) lies 
ub e the limits + 3, but these limits are rarely attained in 
practice. 


2. From (7.3) and (7.5), skewness is zero if M=Mo=Md. 
In other words, for a symmetrical distribution mean, mode and 
median coincide. 


3. Sk>0,if M —Md» Mo | 
ог Sk>0, if Mo<Md<M (7.6) 


Thus fora positively skewed distribution, the value of the 
mean is the greatest of the three measures and the value of mode is 
the least of the three measures. 


If the distribution is negatively skewed, the inequality in (7.6) 
is reversed i.e. the inequalities greater than (i.e. >) and ‘less than’ 
(i.e. <) are intercharged. Thus : 

Sk<0, if M<Md<Mo 
or Sk<0, if Mo>Md>M } 47.7) 


In other words, for a negatively skewed distribution, of the 
three measurers of central tendency mean, median and mode, the 
mode has the maximum value and the mean has the least value. 


4. While ‘dispersion’ studies the degree of variation in the 
given distribution, skewness attempts at studying the direction of 
variation. Extreme variations towards higher values of the variable 
give a positively skewed distribution while in a negatively skewed 
distribution, the extreme variations are towards the lower values of 
the variable. 


5. In Pearson’s coefficient of skewness, the disturbing factor 
of variation is eliminated by dividing the absolute measure of 
a M—Mo by the measure of dispersion c (standard divia- 
ion). 

Example 7.1. Calculate Karl Pedrson's co-efficient of skewnes. 
from the following data : серне 


Size : 1 2 3 SI Gi 7 
Frequency : 10 18 30 дә £3 eh 


(Delhi U. B.Com. IIT, 1984) 
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Solution, 
COMPUTATION OF MEAN, MODE AND S.D. 


Size (x) Freouency (f) fx f 
————— M ÉÓÁÁÉÉÓE ee Е 
it 10 10 10 
2 18 36 72 
3 30 90 270 
4 25 100 ‚ 400 
5 12 60 300 
6 3 18 108 
7 2 14 98 
Total N=100 >/х=328 У уха 1258 
Z fx 328 
Mean (M) TONE = "jog 7328 
S.D(c) = У fx? = ( Z fx y Ж 1258 -( 328 M 
N N 300 100 


—4/1258—107584 —4/ 1:8216 = 1-3497 


Since the maximum frequency is 30, corresponding value of 
x viz., 3 is the mode. Thus Mode (Mo)=3 


Karl Pearson's Coefficient of Skewness in given by 
M—Mo _ 328—300 _ 028 

a 1/7 ЕТЕ IMP a С 

Hence the distribition is slightly positively skewed. 


Sk = 


Example 7.2. Calculate Karl Pearson's Coefficient of Skewness 
from the data given below : 


Weekly wages No. of Weekly wages No. of 
(Rs.) workers (Rs.) workers 
40—50 2, 90—100 | 30 
50—60 6 100—110 36 
60—70 8 110—120 50 
70—80 10 120—130 
80—90 25 130—140 


(Delhi U. В. font Ш, 7985) 
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COMPUTATION OF MEAN, MEDIAN AND S.D. 
— c 


Weekly wages Mid-value No. of workers eS 85 ‘Less than’ 
Q5) — Q0 (Joi c Cep AL coy 

т шшш e Uer S Ru C Н МЫНЕ 
40—50 45 5 —4 —20 80 s 
50—60 55 6 -3 —18 54 11 
60—70 65 8 -2 —16 32 19 
70—80 75 10 —1 —10 10 29 
80—90 85 25 0 0 0 54 
90—100 95 30 1 30 30 84 
100—110 105 36 2 72 144 120 
110—120 115 50 =) 150 450 170 
120—130 125 60 4 240 960 230 
130—140 135 70 5 350 1750 300 


es 
N=2f=300 Bfd=178 Xfd*-3510 
SSS SS Ee 
Since the maximum frequeney viz. 70 occurs towards the end 
of the frequency distribution, mode is ill defined in this case. Hence 
we obtain Karl Pearson’s coefficient of skewness using median viz., 
by the formula : 


Sk= 3 iMcan- Medien) 


09) 
ieee S 10x778 — = 
Mean= A += = 85 +" =854-25.93=110.93 


_, [YO [у j m 35810 T 78 W 
sd. (в) «n, | 347 (22 =10x зю --(- 05) 

=10х У11:7—(2.5932#—=10х \/ТТ.Л—6 1252 

—10x \/29748—10 x 2.23043 —22.3043 


N 300 x 
yay =150. The c.f. just greater than 150 is 170. Hence 


the corresponding class 110—120 is the median class. Using-the 
median formula, we get : 


TOUT Ne уа 
Median H4. ( vc )=110+-20( 150—120 ) 


=1104+ 2 =116 
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Substituting these values in (*), we get 


.3(11093—116) — 3x507  —1521 
Sk—— 23083 223043 —223043 — — 0.6819 


Hence the given distribution is negatively skewed. 


Example 7.3. Consider the following distributions : 


Distribution A Distribution B 
Mean 100 90 
Median 90 80 
Standard Deviation 10 10 


(i) Distribution A has the same degree of the variation as distri- 
bution B. 


(ii) Both distributions have the same degree of skewness. 
True|False ? Give reasons. [Delhi Uni. B.Com. (Hons.) 1974] 


[А.1.М.А. (Diploma in Management) Jan. 1979] 


Solution. 
(i) C.V. for distribution 4—100x $* = 100x 5, =10 


С.У. for distribution 8=100х -2? = 100x 2° 211.11 
Хв 90 


Since С.У. (B) С.У. (A), the distribution B is more variable 
than the distribution 4. Hence the given statement that the dis- 
tribution A has the same degree of variation as distribution B is 
wrong. 

(ii) Karl Pearson's coefficient of skewness for the distributions 
A and B is given by : 


Sk (4) 307 M0) E. 29 20. 
sk(p)= 200789... 


Since Sk(A)=Sk(B)=3, the statement that both the distribu- 
tions have the same degree of skewness 15 true. 


Example 7:4. In а moderatel ly symmetrical distribution, the 
mode and mean are 32:1 and 354 respectively. Calculate the median. 
(Delhi U. B.Com. 1982) 
Solution. For a moderately symmetrical distribution we have : 
Mo=3Md—2M ~ 
> 32-1 =3Md—2 x 35.4=3Md— 70.8 
> 3Md=32:1+70-8=102.9 
> Md= 1029 = 34.3 


Example7.5. Pearson's Coefficient of skewness јог а distri- 
bution is 0°4 and its coefficient of variation is 30 %. Its mode is 88, 
find mean and median. 


(Bombay U. B.Com. April 1983) 
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Solution. We are given Mode=Mo=88, Karl Pearson's 
Coefficient of skewness is : 
M—Mo M-—88 
с 


Sk ———— = —0.4 (given) 


с 
CV.= 1005 =30 (given) — 
ЕБ 
М —88= 0.4 с and c= 100 M=0.3M 
> M—88— 0.4x0.3 M 
> 'M-012M = 88 
=> (1—0:12) М = 88 
88 
"i SU m0 


Substituting in (*) 
с =0.3 x 100=30 


Using the emperical relation between mean, median and mode 
for a moderately asymmetrical distribution viz., 


Mo —3Md—2M - 3Md=Mo+2M 


1 1 
Md == (Mosa) c 5 ( 882x100 ) 


Example 7.6. Pearson's measure of skewness of a distribution 
is 0'5. Its median and mode are respectively 42 and 36. Find the 
Coefficient of Variation. 

U.C.W.A. (Intermediate) December 1984) 

Solution. We are given : 

Median=42, Mode— 36 (ж) 

and Pearson’s Coefficient of skewness =0`5 
Mean—Mode 

с 


> Ss = 


=05 (rt 


To find s.d. (а), we shall first find the value of mean, by using 
the emperical relationship between mean, median and mode for a 
moderately asymmetrical distribution viz., E 


Mode=3Median—2Mean > Mean ws 3 Median—Mode 


2 
3x42—36 126— 36 90 
Mean= ——77— n 056 _ _90_ 
ean 7 2 2 45 
Substituting in (**) we get 
Mean—Mode 45—36 9 
sd.) Gere а жа eee es 
Care 0:5 05 ees 
Coefficient of Variation (С.У.) is given by: 
CY. 10059 —100x 18 =40 Hence С.У. is 40. 


Mean ASEoX 7 


‘Ava eS)! ee 
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Example 7.7. The following information was obtained from 
the records of a factory relating to the wages. 


Rs 
Arithmetic Mean 56.80 
Median 59:50 
S.D. 12.40 


Give as much information as you can about the distribution of 
wages. 


Solution. We can obtain the following information from he 


` above data : 


(i) Since Md=59.50 Rs. we conclude that 50% of the 
workers in the factory obtain the wages above Rs. 59.50 


А, 100 c 100 x 12.40 
(ii) С.У.= Spe UU eg ele 


(iii) Karl Pearson’s coefficient of skewness is given by : 


Sk= 3(M—Md) 3(56:80— 59.50) _ 3X(—2.70) 
c a 12.40 3. 12.40 
—8.1 
12. 
(iv) Using the empirical relation between M, Md and Mo for 
a moderately asymmetrical distribution we get : 
Mo=3Md—2M=3 x 59.50—2 x 56.80 
—178.50— 113.60 64.90 Rs. 
Example 7.8. You are given the position in a factory before 
and after the settlement of an industrial dispute. Comment on the 


gains or losses from the point of view of workers and that of manage- 
ment. 


=] 


= — 0.65 


Before After 
No. of workers 3,000 2,900 
Mean wage (in Rs.) 220 230 
Median wage (in Rs.) 250 240 
Standard deviation (in Rs.) 30 26 


[Osmania U. B. Com. (Hons.) Nov. 1981, Delhi U. M.B.A. 1975] 


Solution. On the basis of the above data we are in a position 
to make the following comments : 


(t) The number of workers after the dispute has decreased 
from 3000 to 2900. Obviously this isa definite loss to the persons 
thrown out or retrenched. It may also be a loss to the management 
if thelr retrenchment affects the efficiency of work adversely. 


(ii) We know that 
Ба Total wages paid 
Total No. of workers 


= Total wages paid =(Average wage) x (Total No. of workers) 
Hence, 


Average wage 
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Total wages paid by the management before the dispute 
= 3000x220 = Rs. 660000 

Total wages paid by the management after the dispute 
= 2900 x 230 = Rs. 667000 


Thus we see that the total wages paid by the management 
have gone up after the dispute (the additional wage bill being 
Rs. 7000), although the number of workers has been reduced from 
3000 to 2900. This is due to the faet that the average wage per 
worker has increased after the dispute — which is a distinct advan- 
tage to the workers. 

It may be pointed out that the increased wages paid by the 
management (Rs. 7000) should not be viewed as a disadvantage to 
the management unless we have definite reasons to believe that the 
efficiency and productivity have not gone up after the dispute. How- 
ever, the loss to the managements due to higher wage bill, will be 
more than compensated if after the dispute, there isa definite 
increase in the efficiency of the workers or/and increase in produc- 
tivity. 

(iii) Although the number of workers has decreased from 3000 
to 2900 after the dispute, the average wage per worker has gone up 
from Rs. 220 to 230. This might probably be a consequence of the 
retrenchment of casual labour or temporary labour working, on 
daily wages or so with relatively lower wages. 

(iv) The median wage after the dispute has come down from 
Rs. 250/- to Rs. 240/-. This implies that before the dispute upper 
50% of the workers were getting wages above Rs. 250/- whereas. 
after the dispute they get wages only above Rs. 240/-. 

(у) Using the empirical relation between mean, median and 
mode (for a mederately asymmetrical distribution) viz. 

Mode=3Median—2Mean, 
we get: 
Mode (before dispute) =3 х 250—2 x 220=Rs. 310 
Mode (after dispute)=3 x 240—2 x 230— Rs. 260 

Thus, we find that the modal wage has come down from 
Rs. 310 (before dispute) to Rs. 260 (after dispute). Thus after the 
ire there is concentration of wages around a much smaller 
value. 


М _ 100х(5.4.) 
(vi) [035 SEE 
С.У. (before dispute) JS. =13.64 
and С.У. (after dispute)= 10х26. = 11.30 


Since С.У. has decreased from 13.64 to 11.30, the distributi 

: Е .30, utio 
of wages has become less variable i.e., more consistent or а 
after the settlement of the dispute. Thus after the settlement there 
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are less disparities in wages and from management point it will 
result in greater satisfaction to the workers. 


(vii) Since we are given mean and median, we can calculate 
Karl Pearson's coefficient of skewness for studynig the symmetry of 
the distribution of wages before the dispute and after the dispute, 
viz., Sk=3(M—Ma)/o 


Sk(before dispute) = 300-25) =—3 


Sk(after dispute)= 5030-00) Las 
. Thus the highly negatively skewed distribution (before the 
dispute) has become a moderately negatively skewed distribution 
(after the dispute). This implies that the curve of distribution of 
wages after the dispute has a less longer tail towards the left. In 
other уон the number of workers getting lower wages has in- 
creased. 


EXERCISE 7:1 
1. Define Pearson’s measure of skewness. What is the difference between 
4 relative measure and the absolute measure of skewness. 
[Delhi Uni. B.A. Eco. (Hons.) 1971} 
E 2. From the following data find out. Karl Pearson's defficient of skews 
Measurement : 10 11 12 13 14 15 
Frequency; 2 4 10 8 5 1 
(Guru Nanak Dev. Uni. B. Com. II, Sept. 1982) 
Ans. 0:3478 


3. Find Karl Pearson's coefficient of skewness based on Mean and Mode 
from the following information : 


Size Frequency Size Frequency 

0—10 10 40—50 16 
10—20 12 50—60 14 
20—30 18 60—70 8 
30—40 25 


(Himachal Pradesh Uni. B. Com. April 1981) 
Ans, Mean=34.61, Mode=34.38, c= 17.06, Sk—0.0135. 


. Scores at an aptitude test by 100 candidates are given below. You 
are шы to calculate Karl Pearson’s Coefficient of skewness, 


Marks No. of Candidates Marks No. of Candidates 


0—10 10 40—50 10 
10—20 15 50—60 10 
20—30 24 60 —70 6 
30—40 25 


(Delhi U. B.Com. External 1979) 
Ans. Sk—0:0476 
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5. Calculate Karl Pearson's Coefficient of skewness from the following 
table and explain its significance : 


Wages No. of Persons Wages No. of Persons 
10—80 12 110—120 50 
80—90 18 120—130 45 
90—100 35 130—140 20 
100—110 42 140—150 8 


[Punjabi U. М.А. (Econ.) 1979, Kurukshetra U. B.Com. Sept. 1981] 
Ans. M=Rs. 110743, Mo=Rs. 116:15, o=Rs. 17:26, Sk= —0°33 


6. Calculate Karl Pearson’s coefficient of skewness from the following 


data : 
Class Frequency Class Frequency 
70—80 18 30—40 21 
60—70 22 20—30 и 
50—60 30 10—20 6 
40—50 35 0—10 5 
[Delhi Uni. В. Com. 1978] 
Ans.  Skewness— Sea =0°046 
177 
7. Calculate Karl Pearson's Coefficient of skewness from the 
following data. 
Class Frequency Class Frequency 
40—60 25 10—15 6 
30—40 15 5—10 4 
20—30 12 3-5 3 
15—20 8 0-3 2 


(Delhi U. B.Com. 1983) 
Hint. Since classes are of unequal magnitudes, we estimate the value of 
= mode by using : 


Mo-3Md-2M +. 50.3 (Mean— Median) 


в 


Ans, Mean=31'13, Median-31:67, c=16:06, Sk— —0°1 


8. Calculate Karl Pearson’s Coefficient of skewness from the follow- 
ing series : 


Wt. іп Кёз: Below 40 40—50 50—60 60—70 70—80 
No. of persons: 10 16 18 25 20 
Wt. in kgs : 80—90 90—100 100 and above 
No. of persons: 4 4 3 і 
[Guru Nanak Dev. U. B.Com. 1980) 
Hint, Take the first class as 30—40 
Ans. Sk=>—0°2155 


^m 9. Calculate Karl Pearson's coefficient of skewness from the following 
ta: 


Marks (above) : 0 10 20 30 40 50 60 70 8 
No. of students: 150 140 100 80 80 70 30 14 0 


[Karnataka U. В. Com. Aprii 1981 ; Kurukshetra U. B. Com. 1980 ; Sept. 1978) 


. Hint. Locate mode by the method of grouping ; mode ill defined. Find 
median. А 


Ans. | Sk-3(M—Md)/o — —0:6622 в | 
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- 10. Find out the mean deviation, standard deviation and quartile 
LUE from the following table. Also calculate Karl Pearson's coefficient of 
skewness. 


Wages (in Rs.) No. of labourers Wages (in Rs.) Мо. of labourers 
Above 200 685 Above 240 209 

ial БТ) 500 s 2250 73 

Fh 220 423 эз 260 50 

» 230 389 E 270 0 


Ans. Q.D.=Rs. 16.75, M.D. (from mean)=Rs. 16:50, Sk— —0:46. 


11. The following facts are gathered before and after an industrial 
dispute : 


Before dispute After dispute 
No. of workers employed 515 509 
Mean wages Rs. 49:50 Rs. 52775 
Median wages Rs. 52:80 Rs. 50:00 
Variance of wages 121:00 144-00 


Compare the position before and after the dispute in respect of (a) total 
wages (b) modal wages (c) standard deviation and (d) skewness. 


[.C.W. А. (Intermediate) December 1981) 


Before dispute After dispute 
Ans. (i) Total wages Rs. 25492:50 Rs. 2684975 
(ii) Modal wages Rs. 59:40 Rs. 44:50 
(ш) С.У. 22:22 22:74 
(iv) ^ Skewness —0:90 0:69 


12. For a group of 10 items ХХ=452, 2X*=24270 and Mode=43:7, 
Find the Pearsonian coefficient of skewness. 


Ans. Sk—0:08 


13. If the mode and mean of a moderately asymmetrical series are res- 
pectively 16 inches and 15:6 inches, what would be its most probable median ? 
(Osmania Uni. B. Com. III Oct., 1983) 
Ans. Median—15:73 inches. 


14. Ina slightly skew distribution the arithmetic mean is Rs. 45 and the 
median is Rs. 48. Find the approximate value of mode. 
(Lucknow U. B. Com. 1980) 
Ans. Mode=Rs. 54, 


15. From the data given below calculate the coefficient of variation. 
Pearson’s measure of skewness= 0°42 
Arithmetic mean=86 
Median=80 


Ans. C.V.—49:84 


16. In a distribution mean—65, median=70 and the coefficient of skew- 
ness is —0:6. Find mode and coefficient of variation. (Assume that the 
distribution is moderately asymmetrical). 


[Delhi U. B.A. (Econ. Hons. I), O.C. 1983] 
Ans. Mode=80 ; C.V.—38:46 
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17. Ina certain distribution, the following results were obtained : 


Arithmetic Mean (X) =45 
Median =48 
Co-efficient of skewness =—9'4 


The person who gave you this data failed to give you S.D. (Standard 
Deviation). You are required to estimate it with the help of the above data. 


[Delhi U. B.Com. (External) 1982] 
Ans. с=22-5 


18. Karl Pearson’s Coefficient of skewness of a distribution is 0°32. Its 
адаа deviation is 6-5 and mean is 29°6. Find the mode and median of the 
istribution. 


If the mode of the above is 24:8, what will be the standard deviation ? . 
[I.C.W.A. (Intermediate), June 1983 ; Sambalpur U. B.Com. 1982] 

Ans. Mode—27:52, Median—28:91, o=15. 

19. For the weight distribution of 100 students, mean and variance were 


45 and 49 kg. respectively. If the value of Karl Pearson's coefficient of skewness 
is —0°4, find the value of the coefficient of variation. 


(Bombay ©. B.Com. Nov. 1980) 
Ans. СУ.=22'22. 
20. A frequency distribution is positively skewed. The mean of the 

distribution is : 

Greater than the mode, Less than the mode, 
Equal to the mode, None of these. 
Tick the correct answer. 

[C.A. (Intermediate) May 1982] 
Ans. M2 Mo 


21, In a moderately skewed distribution the values of mean and median 
are and 6 respectively. The value of mode in such a situation is approximately 
equal to...... 


(а)8 (bii  (c16  (d)Noneof these. 
[C.A. (Intermediate) Nov. 1983] 
Ans. (a) : 8. 
22, Which group is more symmetrically skewed ? 
(i) Mean=22 ; Median—24 ; s.d.—10 
(ii) Mean=22 ; Median=25 ; s.d. —12 
[L.C.W.A. (Intermediate) June 1983] 


Ans. deiner ; Sk (i) =—0°75. Group (ii) is more skewed to the 
eft. 


= 23, What is the relationship between mean, mode, and median ? What 
is the condition under which this relationship holds ? Locate graphically the 
Position of the three measures in the case of both negatively as well as positively 
skewed distribution. 

[Delhi U. B.A. (Econ. Hons.) 1982] 


Ans. Mode=3Median—2Mean, for a moderately asymmetrical distri- 
bution. 
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24. The value of two measures of central tendency are given below for 
three distributions. For each distribution estimate whether it is approximately 
symmetrical, negatively skewed or positively skewed. (Assume each distribution 
is unimodal). Sketch a figure showing what you expect each distribution to look 
like and where you expect all three measures of central tendency (mean, median, 
mode) to be located. 


(i) Mode=50, X—58 
(ii) Median—7I:4, ¥=71-4 


(iii) Median=104, X—96. 
[Delhi U. B.A. (Econ. Hons. 1) 1984] 
Ans. (i) Positiyely skewed ; (ii) Symmetrical ; (iii) Negatively skewed. 


2. Bowley’s Coefficient of Skewness. Prof. A.L. Bowley’s 
coefficient Of skewness is based on the quartiles and is given by : 
_ (Q3—Md)—(Md—Q,) 7.8) 
ЭРЕ OMIAA) ( 


> Sk= .Qi- 01—2Md. (7-9 
0—0; ( ) 
Remarks 1. Bowley’s coefficient of skewness is also known as 
Quartile coefficient of skewness and is especially useful in situations 
where quartiles and median are used, э: 


(i) When the mode is ill-defined and extreme observations аге 
present in the data. 


Gi) When the distribution has open end classes or un-equal 
class intervals. 


‘ In these situations Pearson’s coefficient of skewness cannot 
е used. 


2. From (7:11), we observe that : 
Sk=0, if Q,—Md=Md-—Q, (7:0) 
This implies that for a symmetrical distribution (Sk=0), 


median is equidistant from the upper and lower quartiles. Moreover, 
skewness is positive if : 


Q,—Md>Md—Q, = Озӊ+0,>2ма (011) 
and skewness is.negative if 
Q,—Md«Md —Q, > 0,+0,<2ма +-(7.12) 


3. Limits for Bowley’s Coefficient of Skewness. 


156 (Bowley)] <1 
> —1<Sk (Bowley)<1. --(7.13) 
Thus, Bowley's coefficient of skewness ranges. from — 1 to 1. 
Further, we note from (7° 8) that : 
Sk=+1, if Md—Q,=0, i.e., if Md=Q, 
and: Sk — —1, if Q,— Md—0, i.e., if Q,— Md (715 ) 


(7.14) 


392 Business Statistics 


4. Itshould be clearly understood that the values of the 
Coefficient of skewness obtained by Bowley’s formula and Pearson’s 
formula are not comparable, although in each case, Sk=0 implies 
the absence of skewness że., the distribution is symmetrical. It may 
even happen that one of them gives positive skewness while the 
other gives negative skewness. (See Example 7-12 page 396.) 


5. In Bowley’s coefficient of skewness the disturbing factor 
of variation is eliminated by dividing the absolute measure of skew- 
ness, viz, (Q;—Md)—(Md—Q,) by the measure of dispersion 

Qs— Q3) i.e., quartile range. 

6. The only and perhaps quite serious limitation of this co- 
efficient is that it is based only on the central 50% of the data and 
ignores the remaining 50% of the data towards the extremes. 


3. Kelly’s Measure of Skewness. The drawback of Bowley’s 
Coefficient of skewness, (viz., that it ignores the 50% of the data 
towards the extremes), can be partially removed by taking two deciles 
or percentiles equidistant from the median value. The refine- 
ment was suggested by Kelly. Kelly’s percentile (or decile) measure 
of skewness is given by : 

Sk=(Poo- Ps) —(Pso— Pro) 
= Py Pj —2Py (7.16) 

But P= Р, and Py=D,. Hence, (7.16) can be re-written 

as: 
Sk—(D,—D;) — (Dj— D) 
=D,+D,—2D; --(7.16a) 


Pi and Di are the ith percentile and decile respectively of the given 
distribution. 


(7.16) or (7.162) gives an absolute measure of skewness. How- 
ever, for practical purposes, we generally compute the coefficient of 
skewness, which is given by : 

Poo— P 5o) — (Pso— P39) 
Sk Kell )= ( 90 50. 50 10. 
КО (Pso— Pso) +(Pso— Ру) 
—_?ю+Рь—2Рь 
Py - Ру 
Dy D;-2D; 
Dy—D, 
Remarks 1, We have : D;=P9=Median 


Sk (Kelly 20 2Md ...(7.18) 
90^ 4 10 
Dy D,—2Md 


DD, .--(7.18а) 
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_2. This method is primarily of theoretical importance only 
and is seldom used in practice. 


4. Coefficient of Skewness based on Moments, This coeff- 


cient is based on the 2nd and 3rd moments about mean and is dis- 
cussed in detail in §7.5, 


Example 7.9. Calculate Bowley’s coefficient of skewness of 
the following data : 


Weight (lbs) No. of persons Weight (lbs) No. of persons 


Under 100 1 150—159 65 
100—109 14 160—169 31 
110—119 66 170—179 12 
120—129 122 180—189 5 
130—139 145 190—199 2 
140—149 121 200 and over 2 


[Delhi U. B.Com. 1981] 


Solution. Here we are given the frequency distribution with 
inclusive type classes. Since the formulae for median and quartiles 
are based on continous frequency distribution with exclusive type 
classes without any gaps, we obtain the class boundaries which are 
given in the last column of the following table. 


COMPUTATION OF QUARTILES 


Weight Class boundaries 

(lbs) 

0—99 0 — 99:5 
100—109 99:5—109:5 
110—119 109:5—119-5 
120—129 119:5—129:5 
130—139 129:5—139:5 
140—149 139:5—149:5 
150—159 149:5—159:5 
160—169 159:5—169:5 
170—179 169:5—179:5 
190—199 1893—1993 
200.29 199:5—209:5 


N 146.5, 7.293, 3% _ 4395 


Неге №== 586, 4 2 4 


The c.f. just greater than N/2 i.e., 293 is 348, Hence the 
Corresponding class 129.5—139:5 is the median class. Using the 
Median Formula, we get: 
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ма (0,)=129.5+ 1. 


= 129.5-+ 6.2069 =135.7069= 135.71 
The c.f.just greater than N/4 i.e. 146.5 is 203. ‘Hence the 
corresponding:class 119:5—129.5 contains Q,. 
10 10x 65:5 
2,-u95- 35 ( 146.5—81 )=119.5 Hs EE PN 
—119:54-5.3688— 124.86882:124.87 


The c.f. just greater than 3N/4 i.e. 439.5 is 469. Hence the 
corresponding class 139.5—149.5 contains Q,- 


Q1—139.5 + 3i 439.5 348 )= {39:54 Ss. 


121 
= 139-5-Е7.5620= 147-0620 147.06 
Bowley’s co-efficient of skewness is given by : 


Sk (Bowley) = QetQ:—2Md_ _ _147.06-+124.87—2x 135.71 


< 10х90 
( 293—203 )=1295 + =” 


Q:—0, ` 14106—12487 
271:93—271-42 051 
Тол ооч = 92,19 9028 


Example 7.10. Calculate the coefficient of skewness from the 
following data by using quartiles. 


Marks No. of students Marks No. of students 
Above 0 180 Above 60 65 
Above 15 160 Above 75 20 
Above 30 130 Above 90 5 
Above 45 100 


[.C.W.A. (Intermediate) June 1983] 


Solution. We are given ‘more than’ cummulative frequency 
distribution. To compute quartiles, we first express it as a conti- 
nuous frequency distribution without any gaps as given in the 
following table : 


^ 


COMPUTATION OF QUARTILES 


180—160—20 


20 

15—30 160—130—30 50 

30—45 130—100—30 80 

45—60 100— 65—35 115 

60—75 65— 20—45 160 
5—90 


20— 5=15 
5 
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N _180_ М 180 _ 3N _ 
ses deir 45 and 4 =135 
The c.f. just greater than N/2 i.e., 90 is 115. Hence the corres- 
ponding class 45—60 is the median class. 
h(N 15 
Md=I+ et y —С )=45 + 4a ( 90—80 ) 
—454- ЕЕ — 45-429 —49.29 
The c.f. just greater than N/4 i.e., 45 is 50. Hence the corres- 
ponding class 15—30 contains Q}. 


9 e 15 (4 -c )eis + (45-2) 


=15+ 028 —153:12:5-27/5 


The c.f. just greater than 3N/4 i.e., 135 is 160. Hence the 
corresponding class 60—75 contains Оз. 


hf 3N 15 
ss 1e (2-6 = 60 s 135—115 ) 
20 
—60-- 2 = 60 +.6'67= 6667 


Hence Bowley’s coefficient of skewness based on quartiles is 


given by: 

_ Qs+Q,—2Md _ 66.67-+-27.50—2x 49.29 
ЗЕ Cowes) 0—0; ~ 66.67—27.50 

Т 94.17—98.58 4.4 - 

n 3017 = 3947 7:5 01126 


Example 7.11. In a frequency distribution the coefficient of 
skewness based on quartiles is 0.6. If the sum of the upper and lower 
quartiles is 100 and median is 38, find the value of upper quartiles. 

[Delhi U. B.A. (Econ. Hons.) OC. 1983 ; 
Delhi Uni. B.Com. 1978] 


Solution. We are given : 


= QstQi—2Md _ y x 
cec mr a 
(He) 


Also Q44- Q1—100 and Median=38 
Substituting in (*) we get 
100—2 x 38 
——— -— 0.6 
Qs— Qı 
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396 
100—76 24 
ve Q,— осле 06 =40 
Thus we have : 
Qi Qi— 100 
and 0,—0,=40 


Adding, we get: 140 
20,=140 = => =70 


Hence the value of the upper quartile is 70. 
Example 7.12. From the information given below, calculate 


Karl Pearson’s coefficient of skewness and also quartile coefficient of 
skewness : 


Measure Place A Place B 
Mean 150 . 140 
Median 142 155 
Standard deviation 30 SP] 
Third quartile 195 260 
First quartile 62 80 


(Osmania Uni. B.Com., Nov. 1978) 
Solution. Place А: 


3(M—Md) _ 3(150~142) — 8 
с 1 


Sk(Karl Pearson)= 35 dg 708 


Qs+Qi—2Md — 1954-62—2x 142 


Sk(Bowley) — Qr Orient: 195—62 
257—284 27 
Án ap 0-203 
Place B ; 
Kan Pearson) = М-М) [ шт. 
ЗСЗ) I MOL 
=a ге-002 


Q,4-Q,—2Md  260+80—2x155 
Sk(Bowley) = ЕЕ 
TD Q,—Q; 260— 80 
340—310 1 tx м 
=~ o -.-9167 
Note : See Remark 4 on page 392, 
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EXERCISE 7.2 


1. What is 'Skewness' ? What are the tests of skewness ? Draw different 
skeiches to indicate different types of skewnessand locate roughly the 
relative positions of mean, median and mode in eash case. 


[Bombay Uni. B. Com. Oct. 1971] ` 


2. Give any three measures of skewness of a frequency distribution. 
Explain briefly (not exceeding ten lines) with suitable diagrams the term ‘Skew- 
ness’ as mentioned above. 

[I.C.W.A. (Intermediate) Nov. 1977] 


3. (a) State the empirical relationship among mode, median and arith- 
metic mean of a unimodal symmetrical and a moderately asymmetical frequency 
distribution. 
[Delhi U. B.Com. (Hons.) 1980] 

(b) Show by means of sketches the relative positions of mean, median 
and mode for frequency curves which are skewed to the right and left respec- 
tively. 

4. State the empirical relationship among mean, median and mode in a 
symmetrical anda moderately asymmetrical frequency distribution. How does 
it help in estimating mode and measuring skewness ? 

[Delhi U. B.Com. ( Hons.) 1983] 

5. Explain the meaning of skewness using sketches of frequency curves. 
State the different measures of skewness that are cormmonly used. How does 
skewness differ from dispersion ? 

[Bombay U. B.Com. April 1981) 

6. (a) What are quartiles ? How they are used to measure skewness ? 


(b) Describe Bowley's measure of skewncess. Show that it lies between 
+1. Under what conditions these limits are attained ? 


7. The weekly wages earned by one hundred workers of a factory are 
as follows. Find the absolute measures of dispersion and skewnees based on 
quartiles and interpret the results. 


Weekly wages (Rs.) No. of workers Weekly wages (Rs.) No. of workers 


12:5—17:5 12 37:5—42:5 10 
17:5—22:5 16 42:5—47:5 6 
225—215 25 475—525 3 
27T:5—32:5 14 52:5—5T:5 1 
32:5—37:5 13 


[Karnataka U. B. Com. April 1982)! 
Ans. Q.D. 7:47; Ske(Q;—Md)—(Md-—Q.) —4:26. 4 
8. Find outthe coefficient of skewness using Bowley's formula from 
the following figures : 
Income (in Rs.) No. of Persons Income {in Rs.) No. af Persons 


100—199 39 500—599 38 { 
200 —299 25 600—699 37 
300—399 49 700—799 32 
400—499 62 800—899 18 


(Guru Nanak Dev. U. B. Com. 1979] | 


ЕУ 
mode Clo 0=(А г 


2 dus. Ske0:11465. si 
XQ 
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(b) The Coefficient of skewness for a certain distribution based on 
the quartiles is —0°8. If the sum of the upper and lower quartiles is 100-7 and 
median is 55:35, find the distribution on the basis of the upper and lower 
quartiles. 

U.C.W.A. (Intermediate) June 1984] 

Ans. Qa=56:6; О,=44°1 

14, In a distribution, the difference of the two quatiles is 15 and their 
sum is 35 and the median is 20. Find the Coefficient of skewness. 

[Mysore U. B .Com. April 1931] 

Ans. Sk (Bowley)=—0°33 


15. the first quartile is 142 and the semi-interquartile range is 18, 
End the median (assuming the distribution to be symmetrical about mean or 
median). 


U.C.W.A. Intermediate, Dec. 1978] 
Ans. 160 


16, Fill in the blank : 


“If Q.—6, Qs=10 and Bowley's Coefficient of skewness is 0-5, then the 
value of median will be equal to ...... 25 


ІС. 4. (Intermediate) Мау 1984] 
Ans. Md=7 


17. Fora distribution Bowley's coefficient of skewness is —0:36 Q,—8:6 
and median—13:3. What is the quartile coefficient of dispersion ? 


Ans. Quartile coeff. of dispersion—0'24 


18. The statistical constants given below relate to two distributions A 
and B. Comment on their dispersion and skewness. 


A B 
Median 19:6 244 
Lower Quartile 13:5 15:6 
Upper Quartile 26 37°8 


Ans, Q.D. (4)—6:25; Q.D. (В)=11`1. On the basis of Q.D. distribution 
B is more variable than A. 


Bowley's coeff. (4)220:0024; Bowley’s coefficient (B)=0'019 


,,Boththe distributions are positively skewed. Distribution B is more 
skewed than 4. 


19, Particulars relating to the wage distribution of two manufacturing 
firmslare as follows : 


Firm ‘A’ Firm *В` i 
Rs. Rs. 
Mear wage 175 180 
Median wage 172 170 
Modal wage 167 162 
Quartiles 162 ; 178 165 ; 185 
S.D. 13 19 


Compare the two distributions. 


[Guru Nanak Dev. U. B.Com. Sept. 1981} 


Ans. С.У. (Firm A)=7-43, С.У. (Firm B)=10°56; Bowley's Coeff. of 
skewness Firm A) =—0'25, skewness (Firm B)—0:5; Karl Pearson Coeff. of 
‚ skewness (Firm А)—0:615, skewness (Firm B)—0:947. 


„жем. 
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9. Following figures relate to the size of capital of companies : 


Capital in Lakhs of Rs. No. of Companies 

1— 5 20 

6—10 27 
11—15 29 
16—20 38 
21—25 48 
26—30 53 
31—35 70 


„Find out (i) the median size of the capital; (ii) the coefficient of skew- 
ness with the help of Bowley's measure of skewness. What conclusion do you 
draw from the skewness measured by you ? 

[Delhi Uni. B.Com. 1974; Guru Nanak Dev. Uni. B.Com, 1977] 


Ans : Median=23°47; Sk (Bowley) =—0°119 


10. For the frequency distribution given below, calculate the coefficient” 
of skewness based on the quartiles : 


Class Limits Frequency Class Limits Frequency 
10—19 5 50—59 25 
20—29 9 60—69 15 
30—39 14 70—79 8 
40—49 20 80—89 4 


[I.C.W.A. (Intermediate) Dec. 1976] 
Ans. —0°103 
11. By using the quartiles, finda measure of skewness for the following 
distribution. $ 
Annual Sales (Rs. '000) No.of firms Annual Sales (Rs. '000) No. of firms 


Less than 20 30 Less than 70 644 

» » 30 225 $5, 4180 650 

» » 40 465 soy 665 

» 9. oO 580 » » 100 680 
60 634 


[J.C.W.A. (Intermediate) Dec. 1982] 


Ans. Sk (Bowley)—0:0903. 
12. Calculate Bowley’s coefficient of skewness for the following data : 


Income Equal іо or No. of Persons Income Equal to or No. of Persons 


more than more than 
100 1000 600 200 
200 950 700 150 
300 700 800 100 
400 600 900 50 
500 500 1000 0 


Assume that income is a continuous variable. 
[Bombay Uni. B.Com. Nov. 1975] 


Ans. Sk=0°064 
certain distribution is —0'8. If 
13. (а) The measure of skewness fora certai i ue ыы 


the lower and the upper quartiles are 441 and 56:6 respectively, 
3 [L.C.W.A. (Final) 1971 (0.5.)] 


Ans. Md=55°35 
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7.3. Moments. Moment is a term enerally used in physics or 
mechanics and provides us a measure of the ое ог the Тап 
ч of a force about some ;point. -The moment of a force, say, F 
: out some point P is given by the product of the magnitude of the 

orce (F) and the perpendicular distance (p) between. the point of 
reference and direction of the force, i.e., 


Мотепі=рх F 


Fig. 7.3. 


However, the term moment as used in physics has nothing to 
do with the moment used in statistics, the only anology being that 
in statistics we talk of moment of random variable about some point 
and these moments are used to describe the various characteristics 
of а frequency distribution viz., central tendency, dispersion, skew- 
ness and kurtosis. 


Let the random variable X have a frequency distribution 


X! a 
Sif ha fae fn | Zf=N 
Let Ek 


be its arithmetic mean. 


7.3.1. Moments about Mean. The r^ moment of X about 
_ the mean x, usually denoted by pr [where и is the letter Mu of the 
Greek alphabet] is defined as 


ш= + If (х—®)'; r=0, 1, 2, 3,... (7.19) 


= л лоз sy 


++-(7.19a) 
In particular we get 
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ро =1. «-.(7.20) 
Putting r=1 in (7.19), we get 
m=-} Ef (х—Я)=0, (121) 


because the algebraic sum of deviations of a given set of observa- 
tions from their mean is zero. Thus the first moment about mean is 
always zero. 


Again taking r=2 we get 
m 3f (хро, (122) 


Hence the second moment about mean gives the variance of the 
distribution. 5 


Also n= Of (х—®)з „.(7.23) 
and m=} Zf (x— p)’ «(7-23 a). 


7.3.2. Moments about arbitrary point А. The r^ moment of 
X about any arbitrary point A, usually denoted by yr’ is defined. as : 


m= ds Bf (A) r=0, 1, 2, 3,... (124) 


= | fer Rosae nay ] «02 


In particular taking r=0 and r=1 in (724) we get respec- 
tively, 


in (7.25) 
аА: ..(7.26) 
= £— А+. (7.27) 


where m,’ is the first moment about the point ‘A’. 
Taking r=2, 3, 4, in (7:24) we get respectively 


ра’ — Second moment about A= + Ef (x— A} (7.28) 
us —Third moment about m: Uf (x— 4) (7.28 a) 


pa Fourth moment about m Zf(x—A* (728b) 


Remarks 1. yr, the r^ moment about the mean are also called 
the central ama and p'r, the r moment about any arbitrary 


point A are also known as raw moments. 
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2. In Particular, if we take A=0 in (7.27) we get 
*—0-- j4' (about origin) (07-29) 
Hence the first moment about origin gives mean. 


7.3.3. Relation between Moments about Mean and Moments 
about Arbitrary Point ‘4’. 


have oe 
Mea tC eee кылы rap’? 
Trust --.(7.30) 


Remarks 1. We summarise below the important results on 
moments : 
=l and 4,=0 


Mean=X=A+y,' 
Variance= ost и, p — p2 ++(7.31) 
Ha= Hy — Jus 2р2 (7.32) 
Ваа 4 its jy бира — 35,74 ...(7.33) 


These results are of fundamental importance in Statistics and 
should be committed to memory. Thus, if we know the first four 
moments about any arbitrary point A, we can obtain the measures 
of central tendency [¥=,,' (about origin), dispersion (иь= 02), 
skewness (из or бу) and kurtosis (B,), [The last two measures are 
discussed in the next sections § 7.5 and $ 7.6]. Since, these four 
measures enable us to have a fairly good idea about the nature and 
the form of the given frequency distribution, we generally compute 
only the first four moments and not the higher moments; 


P(x—5)-Zfx-x)y sj F= Zf(x—xy—.. 0 
Dividing by N= =f, we get for a symmetrical distribution : 
Ius py u=... m 
- Har=0 ; г=0; 1, 2, 35. «+-(7.33a) 


Hence for a symmetrical distribution, all the odd order moments 
about mean vanish, Accordingly odd order moments, specially 3rd 
moment is used as a measure of skewnéss, 


3. For obtaining the moments of a grouped (continuous) 
frequency distribution, if we change the scale also in X by taking 


de 44 > Ха-а - (7.335) 


then io. PERI -- (1.330) 
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In Particular, we have 
Pana 
ву'=һ. W хуа 
"UE P 
i «..(7-33d) 
7—3, — 
By =h. зу 274° 
тае 
Ва = №. N Ifd! 
Finally, on using the relationship (7.33) we obtain the 


moments about mean. 


For numerical computations, if the mean of the distribution 
comes out to be integral (i.e., a whole number), then it is conve- 
nient to obtain the moments about mean directly by the formula 


к=-ү By (x2 (1.386) 


However, for grouped (continuous) frequency distribution, 
eR Shay pee are simplified by changing the scale also in Х. If 
we take 


cy E 
att A xX—R=hz ...(7.33f) 
then, we have 
d 
к= [ d ] (7.339) 
aformula which is more convenient to use [c.f. Example 7.15 page 


409]. 
4. Converse. We can obtain an expression for pr’ in terms of 
pr as given below: 
p'r=pr +" Cinram Съри 24+... by", (7.34) 
which is the required expression giving moments about any point 
‘A’ in terms of moments about mean. 
In particular, taking r—2, 3 and 4 in (7.34) and simplifying we 


shall get respectively : 
M LIS (7.35) 
Hs pa usua uns s. } EH 
pa — pa Аран + apa +H pa (7.37) 


These formulae enable us to compute moments about any 
arbitrary point, if we are given mean and moments about the mean. 


7.3.5. Sheppard’s Correction for Moments. In case of 
grouped or continuous frequency distribution, for the calculation of 
moments, the value of the variable Х is taken as the mid-point of 
the corresponding class. This is based on the assumption that the 
frequencies are concentrated at the mid-points of the corresponding 
classes. This ‘Assumption is approximately, true -fen орото 

MAHER ASO SATA Е 
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which are symmetrical or moderately skewed and for which the class 
intervals are not greater than one-twentieth (1/20th) of the range of 
the distribution. However, in practice, this assumption is not true 
in general and consequently some error, known as ‘grouping error" 
is introduced in the calculation of moments. W.F. Sheppard proved. 
that if 


(i) the frequency distribution is continuous, and 


(ii) the frequency tapers off to zero in both directions, 
the effect due to grouping at the mid-point of the intervals can be 
corrected by the following formulae, known as Sheppard’s correc- 
tions : 


в» (corrected) =p, — i (7.38) 
pa (corrected) =p, (7.39) 
ра (Corrected) =p, — Tue sph (7.40) 


where h is the width of the class interval. 


, Remarks 1. This correction is valid only for symmetrical or 
slightly asymmetrical continuous distributions and can not be applied 
in the case of extremely asymmetrical (skewed) distributions like 
J-shaped or inverted J-shaped or U-shaped distributions. As a safe- 
guard against sampling errors, this correction should be applied only 
if the total frequency is fairly large, say, greater than 1000. 


2. It may be worthwhile to quote the words of А.Е. Waugh 
regarding Sheppard’s corrections, 


“The corrections are small and the statistician is foolish to 
bother with them if the original figures are rough approximations. 
But where we have continuous data with the characteristics describ- 
ed above and where the original measurements are reasonably 
precise, we may well apply Sheppard’s corrections to eliminate the 
grouping error.” 


А 7.3.6. Charlier Checks. We have, on using binomial expan- 
Sion for positive integral index, 


x+1 =x+1 

(x+1)?=x2+2x41 

(x+1)8= x84 3x243x4 1 Ы) 
(x-F 1)1—2354-433 + 6л?--4х-Е1 


Multiplying both sides of (*) by fand adding over different 
values of the variable X, we get the following identities, 


Осаго ai no dei eidi i 
ib ў б NP Dd 2i ашт „а En 
anibn $ z oh a iy adi 18 bsistinsonoolsta гэ пэррэті 

ES ўра зра ў uolqmüeeg 2idT eozerio 
У Dt Eft уха 4 6S feet AD fet N 
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These identities are known as Charlier checks and are used in 
checking the calculations in the computation of the first four mo- 
ments. 

74. Karl Pearsol's Beta (8) and Gamma (у) Coefficients 
Based on Moments. Prof. Karl Pearson defined the following four 
coefficients based on the Ist four central moments. 


2 
B (142) 
2 
Вар: 
Вя 47.43) 
y S саса ("Ig ot) ...(7.44) 


»=һ—3=-4— .--(71.45) 
2 


It may be stated here that these coefficients are pure numbers 
independent of units of measurement and as such can be conveni- 
ently used for comparative studies. In practice they are used as 
measures of skewness and kurtosis as discussed in the following 
sections. 

Remark. Sometimes, another coefficient based on moments 
viz., Alpha («) coefficient is used. Alpha coefficients are defined as 


a=" =0, ==] (7.46) 
a=% =y =тү, ..-(7.46а) 
...(7-46b) 


&,=p4/o*= By 

7.5. Coefficient of Skewness based on Moments. Based 
on the first four moments, Karl Pearson’s coefficient of skewness be- 
comes 
TZARE (141) 
2(58,—68:—9) 
where 8; and 8, are Pearson's coefficients defined in (7.42) and 
(7.43) in terms of the first four central moments. Formula (7.47) 
will give positive skewness if М2 Мо and negative skewness if 
M<Mo. Sk=0, if 8,=0 or 8+3=0 = ,——3. 


But TA -үЛх—#}> 0 


Sk= 


1 m 
and =— Xf(x-z)-0 . A 
iwoni ei bodseq п М os ai doidw А эчү} 10 зу 
bigbosi2 & гв ов д qma гй te eqade bas scu VortioV, 
bax =i 


By 
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Since 8, cannot be negative, Sk=0 if 8,—0 or if из=0. Hence 
for a symmetrical distribution, 8,—0. Accordingly, 8, may be taken 
as a relative measure of skewness based on moments, 


Remark. The coefficient Bias a measure of skewness has a 
serious limitation. Hs being the sum of the cubes of the deviations 
from the mean may be positive or negative but џ,2 is always positive. 


back is removed in Karl Pearson's coefficient [Gamma One, (12)1 
which is defined as the Positive square root of By Le. ; 


= ea 
Yi 4-8. = =B,/o3 (7.48) 
Thus the sign of Skewness depends upon Hs. If из is posi- 


tive we get positive skewness and if p is negative, we get negative 
skewness, 


7.6. Kurtosis. So far we have Studied three measures viz., 
central tendency, dispersion and skewness to describe the character- 
istics of a frequency distribution, However, even if we know all 
these three measures We are not in a position to characteríse a dis- 
tribution completely, The following diagram will Clarify the 


A-LEPTO KURTIC 
B-MESO KURTIC 
C - PLATY kURTIC 


AII the three curves are Symmetrical about the mean and have 
Same variation (range), [n Order to identify a distribution com- 
pletely we need one more measure which Prof, Karl Pearson called 


Curve of type B which is neither flat nor peaked is known as 
Normal curve and shape of its hump is aceepted asa standard опе. 
фы min oi 


E 


Ж 
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Curves with humps of ‘the form of normal curve are said to have 
normal kurtosis and are termed as meso kurtic. The curves of the 
type A, which are more peaked than the normal curve are known 
as lepto kurtic and are said to lack kurtosis or to have negative 
kurtosis. ‘On the other hand, curves of the type C, which are flatter 
than the погта1:сигуе are called platy kurtic and they are said to 
possess Kurtosis in excess or'have;positive kurtosis. 

Аз а measure of kurtosis, 'Karl Pearson gave the coefficient 
Beta two (8,) or its derivative Gamma two (У„) defined as follows : 


gaat р, (7.49) 


та Зо 
У 8—3 4—36 (7.50) 
For anormal ог meso-kurtic curve (Type В), 8,=3 or y,—0. 
For а lepto-kurtic curve (Туре A), 8,23 :0г У„>0 and for a platy- 
kurtic curve (Type C), 8,<3 or ү,<0. 
It is interesting to quote here the words of a British statistician 
W.S. Gosset, (who wrote under the pen name of Student), who very 
humorously explains the use of the terms platy kurtic and lepto 
kurtic in the following sentence : “Platykurtic curves like the platy- 
pus, are squat with short tails ; leptokurtic curves are high with long 
tails like the kangaroos noted for leaping.” 


Gosset's little but humorous sketch is given below. 


PLATYPUS 


KANGAROOS 


Fig. 7.5 
Example 7.13. The first three moments of a distribution about 
the value 67 of the variable are 0.45, 8.73 and 8-91. Calculate the 
second and third central moments and comment upon the nature of 
the distribution. 
[Delhi U. B.A. (Econ. Hons. 1), 1985] 
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Solution. In the usual notations we are given : 

A=67, m'=0.45, &,-—8.73 and нз —8.91 

The second and third central moments are given by: 

Ha=p's—p1?=8.73—(0.45)*=8.73 —0.2025=8.5275 

a= Hs — pa Hy’ 3-25, 5—8.91—3X 8.73 X 0.4542 (0.45)3 
=8.91—11.7855+0.18225=— 2.6933 

Hence the variance of the distiribution is 

с?= 48.5275 > c(s.d.)— 4/8.5275— 2.9202 


Since p, is negative, the given distribution is negatively 
Skewed. In other words, the frequency curve has a longer tail 
towards the left. Karl Pearson's moment coefficient of Skewness 
is given by : 

— 2.6933 


mI Hi een a 0.6938 i 
WB Uum,  85275x29202 
—2.6933 _ 
= RA SQ- = —01082 
Hs” 
bee 5—69 - (70-1082 —0.0117 


Since. v, and f are approximately zero, the given distribution 
is approximately symmetrical, 

Example 7:14. The first four moments of a distribution about 
the origin are 1, 4, 10 and 46 respectively. Obtain the various chara- 
cteristics of the distribution on the basis of the information given. 
Comment upon the nature of the distribution. 


U.C.W.A. (Intermediate) June 1984] 
Solution. We are given the first four moments about origin. 
In the usual notations we have: 
A=0, u,'—1, 1,'—4, Ka —10 and p,'=46 
The measure of central tendency is given by : 
Mean (x)—First moment about origin=p,'=1 
The measure of dispersion is given by : 
Variance (02) = ui, —p,'— u,22-4—1—3 
- s.d. (o)—4/3—1:732 
H3— us Зы! ну -Е2шиу%=10—3х4х 1+2х1 
=10—12+2=0 


ч 
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Karl Pearson’s moment coefficient of Skewness is given by : 


ax ees 
ля ГГА э = P =0 

Since y,—0, the given distribution is symmetrical, i.e., 

Mean=Median= Mode 
for the given distribution. Moreover, the quartiles are equidistant 
from the median i.e., 
О, —– Median— Median — Q, 
раа’ — ps л биз pi? — 3, 
—46—4Xx 10X 1--6x4x 1—3x 1—46—40--24—3—27 


Hence Karl Pearson's measure of Kurtosis is given by : 
&-—R—-—--—3 > — Y*,—8—3—0 


Since B,—3, the given distribution is Normal (Meso-Kurtic). 
Since B,—0 and 8,—3, the given distribution is a normal 
distribution with mean (¥)=1 and s.d. (c)—4/3—1.732. ^ 
Example 7.15. Find the standard deviation and kurtosis of the 
following series by the method of moments. 
Class interval: 0—10 10—20 20—30 30—40 40—50 
0 40 20 10 


Frequency 25 M) 2 
[Delhi U. B.Com. (Hons.) ІІ, 1984] 
Solution 
CALCULATIONS FOR MEAN 
M M 
Class Frequency Mid-Value 
Interval 0) (х) fx 
PIES aI. rasta ees УЕ ЛА pertes, eder МЫР Eh IL |, 
0—10 10 5 50 
10—20 20 15 300 
20—30 40 25 1000 
30—40 20 35 700 
40—50 10 45 450 
Total Zf—N-100 Efx—2500 
ee 


Since the mean is an integer, we compute the ceutral 
moments directly by the formula : 


m=- Fe 
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Let us take 
— жж 
2= 2—7 = х-#=№ m 
where h is the magnitude of the class intervals. Substituting in (*) 
aye X Ser (***) 
UIN 


CALCULATIONS FOR CENTRAL MOMENTS 
сш ee ee a EIS 


x eT т=(х—%)[һ fz f2 f2 fe 


=(х—25)/10 
Е шы леа эс, NER UOS 
5 10 =2 —20 40 —80 160 
15 20 =i —20 20 —20 20 
25 4 0 0 0 0 0 
35 20 1 20 20 20. 20 
45 10 2 20 40 80 160 
SSS 
3f=N=100 xf x xo oim 
=0 =120 =0 =360 


Using (***), we get 


z 
F= ES =0 
Уу: 120 _ 
pa —h* у =100х 1007120 
аа O 
[dd N =1000x 10б 0 
Уу 360 
ahi = —-= ) 
Pa=h г == 10,000x 100; 36,000 
di s.d. (c) —4/ p= 4/120— 10.95 
Karl Pearson's measure of Kurtosis is. given. by : 
E йш E 36000 _ i 
= П-и Рту 59753 
m Ya=fa—3= —0.5 


Since „<3 (Le, Y, <0), the given: distribution is platy- 
kurtic ie., frequency curve is flat topped than the normal curve. 


Example 7.16. Find the second, the third and the fourth central 
moment of the frequency distribution. given below. Hence find (i)a 
measure of skewness and (ii) a: measure of Kurtosis, of the given 
distribution. 
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Class Limits Frequency 
100—104.9 7 
105—109.9 13 
110—114.9 25 
115—119.9 25 
120—124.9 30 
100 


U.C.W.A. (Intermediate) December 1980] 
Solution. 
CALCULATIONS FOR MOMENTS 


Class limits — Mid-Value — d—x—112:5 f fd fd fë fd 
E 


100—104:9 102:45 =2 7 —14 28 —56 112 
105—109°6 107:45 -1 13 —13 13 —13 13 
110—114.9 112:45 0 25 0 0 0 0 
115—119.9 117-45 1 25.052502 25 25 
120—124:9 122:45 2 30 60 120 240 480 
If—-N Zfd %/4% xfd* xfa* 
=100 =58 —186 —196 =630 


The raw moments about the arbitrary point A=112.45 are 
given by : . 


iuh 24 =5х 19-299 
„= =25х 1646.50 
ay ie оу 186 245,00 
„= =625x $55 3937.50 


The central moments i.e., the moments about mean are given 
by; 
n,—0 
а= fly’ — u4/2—46.50—(2.9)3 —46.50—8.41— 38.09 
>  $d(cs)—4/u,—6.1717 
I3 — us —3 po на 3-24,5—245—3 X 46.50x 2.9--2x (9.2)5. ` 
—245—404.554-48.778— —110.772 
Bí pa! — As n, бри 3р, 
=3937.5—4X 245x 2.946 x 46.5x (2.9) —3 x (2.9) 
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=3937.5—2842+2346.39 —212.1843— 3229.71 
(i)Absolute measure of skewness is given by: 4—— 110.772. 
Since yp, is negative the given distribution is negatively skewed. 
The relative measure of skewness is given by : 


Eu AI Tee 
nS aS. Рауны 3809x677 
M0772 
5350$ —-04712 
2 
> = 4 —0.2220 
Ез 


(ii) The relative measure of Kurtosis is given Бу Pearson’s 
coefficients 8; and ү, defined аз: 
ра 3229.71 3229.71 
Ва (3809) 1450-8481 
> Yo=8,—3=2-2261-—3=—0.7739 


Since B,<3 (i.e.,¥,<Q), the given distribution is platy-kurtic 
i.e., the frequency curve is slightly flatter than the normal curve. 


=2.2261 


Example 7.17. The first four moments of a distribution about 
the value 4 of the variable are —1.5, 17, —30 and 108. Find the 
moments about mean, B, and Qs. 

р Find also the moments about (i) the origin, and (ii) the point 
х=2. 

Solution. In the usual notations, we are given A=4 and 

1, — —1.5, m'=17, P’ =—30 and p,'=108, 

Moments about mean : i 

=» — Щщ 2 =17—(—1.5)9=17 —2.25= 14.75 

Hg — Hg! Зр ns! - 292 
=—30—3 x (17) x (—1-5)+2(—1.5)8 
=—30+76.5—6.75=39.75 

Pa py’ —4 pag’ pa! 6H, |, 2—31,4 
7—108—4(—30)(—1.5)2-6(17)(— 1.51 — 3(—1.5)« 
7108 —180-4-229.5—15.1875— 142.3125 


Hence 5074275): 3209.05. 9. 
go Ap 123125 _ 1423125 
a= а (14:75) ~ 217.5625 

Also R=A+ my'=44+(—1-5)=2:5 


=0.7541 


Di сфе 
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Moments about origin. We have 
%=2.5, „=14.75, pa= 39.75 and p, 142.31 (approx). 
We know ¥=A-+y,’, where р,” is the first moment about the point 
x=A. Taking A=0 we get the first moment about origin as. 
Шш'= шеап= 2.5. 
Using (7.35) we get 
a! = pa tP? =14.75+ +(2.5)? = 14-754-6.25—21 
Ba! = ug - 3M gb’ i 5—39.754-3(14.75)(2.5)-- (2.5) 
=39.75+ 110.6254- 15.625 —166 
Ia — Ba Aust! бид’ - us" 
=142.3125+4(39.75)(2.5) 4-6(14.75)(2.5)?-- (2.5) 
—142.31254-397.54-553.1254- 39.0625 
=1132 
Moments about the pointu=2. We һауе ¥=A-+ p,’. Taking 
A=2, the first moment about the point x—2 is 
py! =F — 2=2:5-2=0.5 
Hence 
Шә = рш 214.75 +0.25= 15 
из = из ЗР’ +p 39-75-+3(14-75)(0-5)+(0.5)* 
= 39.75 4-22.1254-0.125—62 
Ka pa Ansa! би," 
—142.3125--4(39 75)(0-5)-+ 6(14-75)(0-5)?+(0.5)4 
=142.3125+ 79.5--22.1254-0:0625 
= 244 
Example 7.18. Examine whether the following results of a 
piece of computation for obtaining the second central moment 
are consistent or not ; n=120, XfX— —125, >Ъ/Х?= 128. 
U.C.W.A. (Intermediate) Dec. 1982] 


Solution. We have 
Буры ey aa ү су 
PY SEEN? ANG): 120 GN 120 
2=1:0670—1.0816= — 0.0146 


which is impossible, since variance cannot be negative. Hence the 
given data are inconsistent. , 


EXERCISE 7-3 


‘1. Distinguish between ‘‘Skewness” and *'Kurtosis'" and bring out 
their importance in describing frequency distributions. 
(Punjabi U. M.A. Econ. 1979) 


2. “Averages, dispension, skewness and kurtosis are complementary to 


one another in understanding a frequency distribution.” Elucidate. 
[Osmania U. B.Com. (Hons.) Nov. 1981] 
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3. (a) Explain the terms Skewness and Kurtosis used in connection with 
the frequency distribution of a continuous vatiable. Give the different measures 
of skewness (any three of the measures to be given) and kurtosis. 

[L.C.W.A. (Intermediate) June 1977] 

(b) What do you mean by 'Skewness' in Statistics? Explain one of the 
methods of measuring skewness. 

(Karnataka U. B.Com Nov. 1981) 

#4. Explain briefly how the measures of skewness and kurtosis can be 
used in discribing a frequency distribution. 

[Delhi Uni. MBA.1976, 71) 


5. What ismeant by moment ofa distribution ? Show how moments 
are used to describe the characteristics of a destribution viz., central tendency, 
dispersion, skewness and kurtosis. U.C. W.A. (Intermediate) June 1975] 


1 6. Why do we calculate, in general, only the first four moments about 
mean, of a distribution and not the higher moments ? 


7. Find the fürst, second, third and fourth central moments of the 
set of numbers 2, 4, 6, 8. [I.C.W. A. (Intermediate) June 1983, June 1982] 


Ans. p=0, ра=5, p=0, y, —41 

8. Calculate В, and Ву (measuses of skewness and kurtosis) for the 
following frequency distribution and hence cemment onthe type of the frequ- 
ency distribution : 
445.6 
7 2 t 

[Punjab U. M.A. (Econ) 1981] 
Ans. B1—0:0204 ; В,=3°1080. Distt. is approximately normal. 


9. Find the first four moments abou: the mean in the following 

distribution. 

Height (in inches): 60—62 63—65 66—68 69—71 72—74 
= Frequency : 5 18 42 27 8 


3 
3 


= 
- 


[Delhi О. B.Com. (Hons.) 11—1982] 
Ans. m=0, щ=8°5275, у= —2:6933, u,—199:3759. 
10, Find the variance, skewness and kurtosis of the following series 
by the method of moments : 
Class interval : 0—10 10—20 20—30 30—40 
Frequency 9-0 4 3 2 
[Delhi U. B.Com. (Hons.) 1983] 


Ans. 02—84, 7:=0°0935, Ba=2:102 
11. Find the Kurtosis for the following distribution. 
Class Interval: 0—10 10—20 20—30 30—40 
Frequency : 1 3 4 2 
Comment on the nature of the distribution 
[.C.W.A. (Intermediate) June 1982] 


Ans. u,=8l, u,—14817, В,—2:26. 
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Find the second, third and fourth central moments of the f. 
псу distribution given. below. Hence find (i and 
(i) a measure of kurtosis (үз). кт шен 


Class limit Frequency. Class limit Frequency 
110:0—114:9 5 130:0—134:9 10 
11570 —119:9 15 135:0—139:9 10 
120*0—124:9 20 140:0 —144:9 5 
125:0—129:9 35 

100 


U.C.\W.A. (Intermediate) June 1976] 


Ans. i454, i4 —10075, u,—7827 ; 11—0:2533, Y2=—0°3158 


x 13, The first two moments ofa distribution about the value 5 of the 
уагіаЫе аге 2.and 20; Find the-mean and the variance, 


U-C.W.A (Intermediate) June 1977] 


Ans.. Mean—7, Variance=16 


14. The first three moments.of a distribution about the value 7 calcu- 
lated from a set of 9 observations:are 0:2, 19'4 and —41. Find the measures of 
central tendency and dispersion and‘ also the third moment about mean. 


U.CW. A. (Intermediate) Dec. 1975} 
Ans. Mean—7; ua—Variance—19:36 ; u4— — 52:48 


15. The first three moments of a distribution about the value 2 of the variable 
аге 1, 16and —40. Show that the mean is3,the variance is 15 and из= —86. 
Also show that the first three moments about x—0 are 3,24 and 76. 


16. In a certain distribution the first four moments about the point 4 are 
1:5, 17, —30 and 108 respectively. Find the Kurtosis of the frequency curve 
and comment on its shape. 
U.C. W.A. (Intermediate) Dec. 1982] * 
Ans. 4—2:3088. Distt. is Platykurtic. 


17. The first four central moments are 0, 4, 8 and 144. Examine the 
skewness and kurtosis. H.C.W A. (Intermediate) June 1984]. 


Ans. ү=1,8=9 > — v1—6. 
18. Тһе central moments of a distribution are given by : 
ш:=140, ps=148, и,=6030. 


Calculate the moment measures of skewness and kurtosis and comment 
on the shape of the distributation. 


U.C.W. A. (Intermediate) December, 1984] 
Ans. Y1—0:0893 ; 8,—0:3076 ; Distt. is approximately symmetrical and 
platy-kurtic. 


19 dard deviation of a sysmmetrical distribution is 5. What 
must be the Hedge the fourth moment about the mean in order that the 
distribution be (a) lepto-kurtic, (b) meso-kurtic, and (с) platy-kurtic ? 

Ans. (а) Distt. is lepto-kurtic if u,71875 

(b) Disst. is meso-kurtic if u,—1875 
(c) Distt. is platy-kurtic if u,1875 
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20. Obtain the measure of skewness from the following data and write 
a note on the results obtained : 


Mean=0-2944 ; Мейіап= —0:4018; Q:——1:4568 ; 0;=1-2316: 
o=2°6408 ; 8,—3:854. 


Ans. Karl Pearson’s skewness=0-791 ; Bowley’s skewness =0°215, 
ү1=1:9632 


21. From the data given below, first calculate the first four moments 
about an arbitrary value and then after Sheppard’s corrections, calculate the 
first four moments about the mean. Also calculate 8; and comment оп its 
value : 

Average number of hours worked by workers in 100 industies in 1968. 
Hours worked 30—33 33-36 36—39 39-42 42—45 45—48 
No.of Industries 2 4 26 47 15 6 

[Rajasthan Uni. М. Com. 1970] 


Ans. Moments after applying Sheppard's correction are : 
ш=0, u4—8701, ра= —20-69, = 249-393 


22. The first four moments of a distribution about the value 4 are — 1-5, 
17, —130, 108. Find whether the data are consistent. 
Ans. q4—108—780--229:5—15:1875— —457-6875 
Since y, is negative, data are inconsiatent. 
36. Fill in the blanks ; 
(а) (i) If 8;—3, the curve is called.... 
(ii) If B,3, the curve is calle 
(iii) If B,—3, the curve is calle .. 
(iv) If 8,—0, the curve'is called... 
(b) (i) is always...(>.=<). 
(ii) wy is always...(2,—«). 
(iii) us is always...(2,—«). 
(iv) B, is always...(2,—«) 
Comment on the result when equality sign holds. 


23. Fill in the blanks : 
(i) Literal meaning of skewness LEM e 
(ii) Kurtosis is a measure of ...... of the frequency curve. 
(iti) For a symmetrical distribution, mean median and mode ...... 
(iv) If Mean« Mode, the distribution is skewed 
(9) If Mean Median, the distribution i :Skewed 
(vi) Bowley's coefficient of skewness lies between......and...... 
(vii) &:—0 implies that distribution 1875. 
(viii) If ү,>0, the distribution is ...... skewed, 
(ix) If y:«0, the distribution is 
(x) For a normal curve В....... 
(xi) 17823, the curve is called...... 
(xii) If 8,3, the curve is called...... 


(xiii) An absolute measure of skewness based on moments is...... 
(xiv) An absolute measure of skewness based on quartiles is ...... 
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(xv) Relative measure of skewness based on mean, s.d. and mode 
15 ...... 


(xvi) Relative measure of kurtosis is ...... 
(xvii) Relative measure of skewness in terms of moments is ...... 
(xviii) For a moderately asymmetrical distribution : 
Mean—Median=? (Mean—Mode) 
(xix) If u4— — 1:48, the curve of the given distribution is stretched 


тоге to the...... than to the...... 
(xx) For a symmetrical distribution, ......are equidistant from ...... 
(xxi) Mean=First moment about ...... 
(xxii) Variance=...... moment about mean. 


xxiii) If w’ is the first moment about the point ‘A’ then Меап-...... 


(xxiv) з= y h=, 
; Ba 
5: H Үз=.....- 
(xxv) In а moderately asymmetrical distribution the distance between 
y the distance between...... and...... 


Ans, (i) Lack of symmetry, (ii) Convexity (flatness or peakedness), (iii) 
Coincide, (iv) Negatively, (v) Positivély, (vi) —land 1, (wii) 
Symmetrical, (viii) Positively, (ix) Negatively, (x) 3, (xi) Lepto- 


(xv) (М—Мо)/в, (xvi) B, or Үз, (xvii) Bi or үз, (xviii) 1/3, (xix) 
Left, Right, (xx) Quartiles, median, (xxi) Origin (i.e. A=0), (xxii) 
Second, (xxii) A+’, (xxiv) us, ш, vitalia? ; үз=В,—3, 


24. State whether the following statements are true (T) or false (F). 
(i) Skewness studies the flatness or peakedness of the distribution, 
(ii) Kurtosis means ‘lack of symmetry’. 
(iii) For a symmetrical distribution 8,—0. 
(iv) Skewness and kurtosis help us in studying the shape of the 
frequency curve. 
(v) Bowley's coefficient of skewness lies between —3 and 3. 


(vi) Two distributions having the same values of mean, s.d. and 
skewness must have the same kurtosis. 


(vii) A positively skewed distribution curve is stretched more to the 
right than to the left. 
(viii) If 812-3, the curve is called platy-kurtic. К 
(ix) If 8:—3, the curve is called normal. 
(x) В. is always non-negative. 
(xi) P: can be negative. 
(xii) Variance=p, (2nd moment about mean) 
(xiii) For a symmetrical distribution 


Vau aree =0. 
(xiv) For a symmetrical distribution 
Mean? Median? Mode 
Qv) PiSu: 


Ans, (i) F, (ii) F, Gif) T, (iv) T, (у) F, (Wi) F, (vit) T, (vii) E, (ix) Т, 
© OREN E Gt) "Gai, GA) FQ) Eo" COROT 
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8.1. Introduction. So far we have confined our discussion 
to univariate distributions only i.e., the distributions involving only 
one variable and also saw how the various measures of central 
tendency, dispersion, skewness and kurtosis can be used for the 
purposes of comparison and analysis. We may, however, come 
across certain series where each item of the series may assume the 
values of two or more variables HH we measure the heights and 
weights of 7 individuals, we obtain a series in which each unit 
(individual) of the series assumes two values—one relating to heights 
and the other relating to weights. Such distributions, in which each 
unit of the series assumes two values ie called a bivariate distribu- 
tion. Further, if we measure more than two variables on each unit 
of a distribution, it is called a multivariate distribution. Ina series, 
the units on which different measurements are taken may be of 
almostany nature such as different individuals, times, places ctc. 
For example we may have : 


(i) The series of marks of individuals in two subjects in an 
examination. 


(ii) The series of sales revenue and advertising expenditure of 
different companies in a particular year. 


(iii) The series of exports of raw cotton in crores of rupees 
and imports of manufactured goods during number of years from 
1979 to 1984, say. 


(iy) The series of ages of husbands and wives in а sample of 
selected married couples and so on. 


Thusin a bivariate distribution we are given a set of pairs of 
observations, one value of each pair being the values of each of the 
two variables. 


In a bivariate distribution, we may be interestcd to find if 
there is any relationship between the two varriables uuder study. 
The correlation is a statistical tool which studies the relationship 
between two variables and correlation analysis involves various 
methods and techniques used for studying and measuring the extent 
of the relationship between the two variables. 
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WHAT THEY SAY ABOUT CORRELATION—SOME 
DEFINITIONS AND USES 
“When the relationship is of а quantitative nature, the appropri- 
ate statistical tool for discovering and measuring the relationship and 
expressing it in a brief formula is known as correlation. 
Craxton and Cowden 
“Correlation is an analysis of the covariation between two or 


> 


mote variables.” A.M. Tuttle 

“Correlation analysis contributes to the understanding of econo- 
mic behaviour, aids in locating the critically important variables on 
which others depend, may reveal to the economist the connections by 
which disturbances spread and suggest to him the paths through which 
stabilising forces may become effective." W.A. Neiswanger 

“The effect of correlation is to reduce the range of uncertainty 
of our prediction." Tippett 


Two variables are said to be correlated if the change in one 
variable results in a corresponding change in the other variable, 
8.L1. Types of Correlation 
(a) POSITIVE AND NEGATIVE CORRELATION 

If the values of the two variables deviate in the same direction 
i.e., if the increase in the values of one variable results, on an aver- 
аре, in a corresponding increase in the values of the other variable 
or if a decrease in the values of one variable results, on an average, 


in a corresponding decrease in the values of the other variable, 
correlation is said to be positive or direct. 


Some examples of series of positive correlation are : 

(i) Heights and weights. 

(i) The family income and expenditure on luxury items. 
(iii) Amount of rainfall and yield of Crop (up to a point). 
(iv) Price and supply of a commodity and so on. 


On the other hand, correlation is said to be negative or inverse 
if the variables deviate in the opposite direction i.e., if the increase 
(decrease) in the values of one variable results, on the average, ina 
«рее decrease (increase) in the values of the other 
variable, 


3 Some examples of negative correlation are the series relat- 
ing to: 

(i) Price and demand of a commodity. 

(ii) Volume and pressure of a prefect gas. 


(iii) Sale of woollen garments and the day temperature, and 
50 on. 
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(6) LINEAR AND NON-LINEAR CORRELATION 


The correlation between two variables is said to be linear if 
corresponding to a unit change in one variable, there is a constant 
change in the other variable over the entire range of the values. For 
example, let us consider the following data : 


Thus for a unit change in the value of x, there isa constant 
change viz., 2 in the corresponding values of у. Mathematically, 
above data can be expressed by the relation 


у=2х+3 


In general, two variables x and y are said to be linearly rela- 
ted, if there exists a relationship of the form 


y=a+bx ey 


between them. But we know that (*) is the equation of a straight 
line with slope ‘b’ and which makes an intercept ‘a’ on the y-axis 
[c.f. y=mx-+e form of equation of the line]. Hence, if the values 
of the two variables are plotted as points in the xy-plane, we shall 
geta straight line. This can be -easily checked for the example 
given above. Such phenomena occur frequently in physical sciences 
but in economics and social sciences we very rarely come across the 
data which give a straight line graph. The relationship between 
two variables is said to be Non-linear or curvilinear if corresponding 
to a unit change in one variable, the other variable does not change 


at a constant rate but at fluctuating rate. In such cases if the data ^ 


are plotted on the xy-plane we do not geta straight line curve. 
Mathematically speaking, the correlation is said to be non-linear if 
the slope ofthe plotted curve is not constant. Such phenomena 
are common in the data relating to economics and social Sciences. 


Since the techniques for the analysis and measurement of non- 
linear relation are quite complicated and tedious as compared to 
the methods of studying and measuring linear relationship, we 
generaly assume that the relationship between the two variables 
understudy is linear. In this Chapter we shall confine ourselves 
to the measurement of linear relationship only. The measurement 
m ошоп relationship is, however, beyond the scope of this. 

ook. 


et ee i 


| 
| 
| 
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to establish mathematical relationship between the variables under. 
study since in such phenomena, the values of the variables under 
study are affected simultaneously by multiplicity of factors and it is 
extremely difficult, sometimes impossible, to study the effects of 
each factor separately. Hence, in the data relating to social and 
economic phenomena, the study of correlation cannot be as 
accurate and precise. 

8.1.2. Correlation and Causation. Correlation analysis enables 
us to have an idea about the degree and direction of the relation- 
Ship between the two variables under study. However, it fails to 
reflect upon the cause and effect relationship between the variables. 
In a bivariate distribution, if the variables have the cause and effect 
relationship, they are bound to vary in sympathy with each other 
and, therefore, there is bound to be a high degree of correlation 
between them. In other words, causation always implies correlation. 
However, the converse is not true j.e., even a fairly high degree of 
correlation between the two variables need not imply a cause and 
effect relationship between them, The high degree of correlation 
between the vairables may be due to the following reasons : 

1. Mutual dependence. The phenomena under study may inter 
influence each other. Such situations are usually observed in data 
relating to economic and business situations. “For instance, it is 
well known principle in economics that prices ofa commodity 
are influenced by the forces of supply and demand. For instance, 
if the price of a commodity increases, its demand generally decrea- 
ses, (other factors Temaining constant). Here increased price is the 
cause and reduction in demand is the effect. However, a decrease in 
the demand of a commodity due to emigration of the people or due 
to fashion or some other factors like changes in the tastes and habits 
of people may result in decrease in its price. Here, the cause is the 
reduced demand and the effect is the reduced price. Accordingly 
the two variables may show a good degree of correlation due to 
interaction of each on the other, yet it becomes very difficult to 
isolate the exact cause from the effect. 

2. Both the variables being influenced by the same external 
factors. A high degree of correlation between the two variables 
‘тау be due to the effect or inter action ofa third variable ora ~ 
number of variables on each of these two variables. For example, 
a fairly high degree of correlation may be observed between the 
yield per hectare of two crops, say, rice and potato, due to the effect 
of a number of factors like favourable weather conditions, fertili- 
sers used, irrigation facilities, etc., on each of them. But none of 
the two is the cause of the other. 

3. Pure chance. It may happen that a small randomly selec- 
ted sample from a bivariate distribution may show a fairly high 
degree of correlation though, actually, the variables may not be 
correlated in the population. Such correlation may be attributed 
to chance fluctuations. Moreover, the conscious or unconscious 
bias on the part of the investigator, in the selection of the sample 
may also result in high degree of correlation in the sample. In this 
connection, it mày be worthwhile to make a mention of the two 
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phenomena where a fairly high degree of correlation may be obeser- 
ved, though it is not possible to conceive them as being causaily 
related. For example we may observe a high degree of correlation 
between the size of shoe and the intelligence of a group of persons. 
Such correlation is called spurious or non-sense correlation. [For 
details see §8-4-2 (iii)]. 

8-2. Methods of Studying Correlation. We shall confine 
our discussion to the methods of ascertaining only linear relationship 
between two variables (series). The commonly used methods for 
studying the correlation between two variables are : 

(i) Scatter diagram method. 

(ii) Karl Pearson’s coefficient of correlation. (Convariance 
method). 

(iii) Two-way frequency table. (Bivariate correlation method). 

(iv) Rank method. 

(у) Concurrent deviations method. 

83. Scatter Diagram Method. Scatter diagram is one of 
the simplest ways of diagrammatic representation of a bivariate 
distribution and provides us one of simplest tools of ascertaining 
the correlation between two variables. Suppose we are given n 
pairs of values (ху, y1), (xs, ¥$),+++, (zn, yn) of two variables X and Y. 
For example if the variables Y and Y denote the height and weight 
respectively, then the pairs (ху, уу), (zs, ys), ---,(ха, Yn) may represent 
the heights and weights (in pairs) of n individuals. These n points 
may be plotted as dots ( . ) on the x-axis and y-axis in the xy-plane. 
(It is customary to take the dependent variable along the y-axis 
and independent variable along the x-axis.) The diagram of dots 
so obtained is known as scatter diagram. From scatter diagram we 
can forma fairly good, though rough, idea about the relationship 

` between the two variables. The following points may be borne in 
mind in interpreting the scatter diagram regarding the correlation 
between the two variables : 


7 (i) If the points are very dense i.e., very close to each other, a 
fairly good amount of correlation may be expected between the two 


variables. On the other hand, ifthe points are widely scattered, - 


a poor correlation may be expected between them. 


.. (i) Ifthe points on the scatter diagram reveal any trend 
(either upward or downward), the variables are said to be correlated 
and if no trend is revealed, the variables are uncorrelated. 


(ii) If there is an upward trend rising from lower left hand 
corner and going upward to the upper right hand corner, the cor- 
relation is positive since this reveals that the values of the two 
variables move inthe same direction. If, on the other hand, the 
points depict a downwrrd trend from the upper left hand corner to 
the lower right hand corner, the correlation is negative since in this 
case the values of the two variables move in the opposite directions. 


(iv) In particular, if all the points lie on a straight line starting ` 


from theleft bottom and going up towards the right top, the cor- 
relation is perfect and positive, and if all the points lie on a straight 
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line starting from left top and coming down to right bottom, the 
correlation is perfect and negative. 

The following diagrams of the scattered data depict different 


forms of correiation. 
PERFECT POSITIVE PERFECT NEGATIVE 

x, CORRELATION Y, CORRELATION 
e 


x 


о 
Fig. 8.1 Fig. 8.2 
LOW DEGREE OF LOW DEGREE OF 
POSITIVE CORRELATION NEGATIVE CORRELATION 


x о 
Fig. 8.3 Fig. 8.4 


HIGH DEGREE OF HIGH DEGREE OF 
POSITIVE CORRELATION NEGATIVE CORRELATION 


Fig. 8.5 Fig. 8.6 
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МӨ CORRELATION NO CORRELATION 


Fig. 8.7 «Fig. 8.8 


Remarks 1. The method of scatter diagram is readily com- 
prehensible and enables us to form a rough idea of the nature of the 
relationship between the two variables merely by inspection of the 
graph. Moreover, this method is not affected by extreme obser- 
vations whereas all mathematical formulae of ascertaining correla- 
tion between two variables are affected by extreme observations. 
However, this method is not suitable if the number of observations 
is fairly large. 

2. The method of scatter diagram only tells us about the 
nature of the relationship whether it is Positive or negative and 


whether it is high or low. It does not provide us an exact measure 
of the extent of the relationship between the two variables. 


of obtaining the line of best fit is discussed in next chapter (Regres- 
sion Analysis). 


Example 8.1. Following are the heights and weights of 10 
students of a B.Com. class. 


Height (in 
inches) X :62 72 68 58 65 70 66 63 60 72 
Weight (kgs) Y : 50 65 63 30 54 60 61 55 54 65 


Draw a scatter diagram and indicate whether the correlation is 
Positive or negative, 
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Selutien. The scatter diagram of the above data is shown 


below. 
S CATTER DIAGRAM 


O 58 60 62 6% 66 68 70 72 X 
Fig. 8.9 


Since the points are dense i.e., close to each other, we may 
expect a high degree of correlation between the series of heights and 
weights. Further, since the points reveal an upward trend starting 
from left bottom and going up towards the right top, the correlation 
is positive. Hence we may expect a fairly high degree of positive 
correlation between the series of heights and weights in the class of 
B. Com. students. 

EXERCISE 8.1 


1. Explain clearly the.concept of correlation. Clearly explain with 
Suitable illustrations its role in dealing with business problems. 
(Delhi Uni. M.B.A. 1975) 
2. (a) Define the term correlation. Explain the concept of positive and 
negative correlation with examples. 


(b) State the nature of the following correlations (positive, negative or 
no correlation) : 

(i) Sale of woollen garments and the day temperature ; 

(ii) The colour of the sari and the intelligence of the lady who wears it ; 


(iii) Amount of rainfall and yield of crop. 
[I.C.W.A. (Final) June 1977] 
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3. (a) Define correlation. Discuss its significance. Does correlation 
always signify causal relationship between two variables ? Explain with illustra- 
tion. 


(Delhi U. B.Com. 1985) 
(b) Does the high degree of correlation between the two variables signi- 

fy the existence of cause and effect relationship between the two variables ? 
(Delhi U. B.Com. (Hons.) 1983, 1980] 


: 4. (a) What is ‘spurious correlation’ and ‘nonsense or chance correlation’ ? 
Explain with the help of an example. 


(6) Comment on the following statement : “A high degree of positive 
Correlation between the ‘size of the shoe’ and the *intellligence' of a group of 
individuals implies that people with bigger shoe size are more intelligent than 
the people with lower shoe size." 


5. What is correlation ? What is a scatter diagram ? How does it help 
in studying correlation between two variables, in respect of both its nature and 
extent ? 


[Delhi U. M.B.A. Dec. 1981 ; Poona Uni. B. Com. 1973] 
x 6° (a) Explain clearly the scatter diagram method of measuring corre- 
lation. Do you think it is a perfect method ? 
(6) Distinguish between positive and negative correlation with the help 
of a scatter diagram. [Delhi U. B.Com. 1973) 
7. (a) Write a note on scatter diagram. Draw sketches of scatter diagram 
to show the following correlation between two variables x and y: 
(i) linear 
(ii) linear and perfect 
(iii) non-linear 
(iv) xand y uncorrelated. 
(Bombay U. B.Com. Oct. 1981] 


(b While drawing a scatter diagram if all points appear to form a 
straight line going downward from left to right, then it is inferred that there 
i 


(a) Perfect positive correlation. 
(6) Simple positive correlation. 
(c) Perfect negative correlation. 
(d) No correlation, 
[C.A. (Intermediate) Nov. 1983] 


8. Given the foliowing pairs of values : 


Capital employed (Crorcs of Rs.) 2 3 5 6 8 
Profits (Lakhs of Rs.) 6 5 T 8 12 11 


(а) МаКеа scatter diagram. 


(6) Do you think that there is any correlation between profits and 
capital employed ? Is it positive or negative ? Is it high or low ? 
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9. Draw a scatter diagram from the following data : 


Height (inches) : 62 72 70 60 67 70 64 65 60 70 
Weight (Ibs.) : 50 65 63 52 56 60 59 58 54 65 
Also indicate whether correlation is positive or negative, 
[Delhi Uni. B.Com. 1977] 
Ans. Positive Correlation. 


10. Drawa Scatter diagram for the data given below and interpret it, 
Xis 10 20 30 40 50 60 70 80 
XAT 32 20 24 36 40 28 38 44 


Karl Pearson’s measure, known as Pearsonian correlation 
Coefficient between two variables (series) Y and Y, usually denoted 
by r(X. Y) or rey or simply r is a numerical measure of linear 
relationship between them and is defined as the ratio of the covari- 
ance between X and Y, written as Cov. (x, y), to the product of the 
Standard deviations of ¥ and Y. Symbolically, 


— Cov. (x, y) -- (84) 


8;0y 


r: 


Tf (v, у), (аъ, уз), (xn, y^) aren pairs of observations of 
the variables Y and Y in a bivariate distribution, then 


Cov. (x, J= Ж(—)(у—ў) 


= а у(х я) (8.2) 
1 у}: 
су= | —-50—5) 


summation being taken over n pairs of observations. Substituting 
in (8*1) we get, | 


J-x-z)9—y) 


ja 
/ I1G—3). l (уу) nes 


p= 2e 39-y) 


R У 99. 20- yF 


428 Business Statistics 


The formula (8 3) can also be written as 
8005 dy 


...(8.3a) 
v Xdà . Уа, 


where dz and dy denote the deviations of x and y values from their 
arithmetic means x and Y respectively i.e. 


r 


dx=x—¥, dy=y—y ---(8-3 b) 
Simplifying (82) we get 


Cov (x, y)= 1 YG—3)9—y) 


= 1 sya (8-4) 

п 

z: 5 
- ented (аа) 

> Cov с) | nXxy —(Ex)(Zy) ] -- (84a) 
Oz? = I-x-sy E “Ext [c.f. Chapter 6] 

x | Ух үг 

-т=-(®) 
-«| nat qa] -- (845) 


Similarly we have, 
t= при ур ] ..(8.4с) 
Substituting from ( 8.4a), (8.45) and (8.4с) in (8.1) we get 
1 
aL "2-095 | 


r= 


Correlation 429 


п®ху- (Ex)(Zy) 


> Tz 
Уи) ту —(Zy)t] ... (8.5) 


Remark. Formula (8.3) or (83a) is quite convenient to 
apply if the means ¥ and y come out to be integers (i.e., whole 
numbers) If x or/and y is (are) fractional then the formula (8.3) or 
(8.3а) is quite cumbersome to apply, since the computations of 
Z(x—x)*, X(y—y)* and Z(x—Z)(y—y) are quite time consuming and 
tedious, In such а case formula (8.5) may be used provided the 
values of x or/and y are small. But if x and y assume large values, 
the calculation of Zx*, Ху? and Zxy is again quite time consuming. 


Thus if (i) ¥ and y are fractional and (ii) x and y assume large 
values, the formulae (8.3) and (8.5) are not generally used for numeri 
calproblems. In such cases, the step deviation method where we 
take the deviations of the variables X and Y from any arbitrary 
points, is used. We shall discuss this method after the properties of 


correlation coefficient (c.f. $ 8-4-1 page 434). 


Example 8.2. Calculate Karl Pearson's coefficient of correla- 
lion between expenditure on advertising and sales from the data 


given below. 
Advertising expenses 
(000 Rs.) 39 65 62 90 82 75 25 98 36 78 


Sales (lakh Rs.) 47 53 58 86 62 68 60 91 51 84 
[Delhi U. B. Com. 1985 ; C.A. (Intermediate) Nov. 1975] 


Solution. Let the advertising expenses (in '000 Rs.) be 
denoted by the variable x and the sales (in lakh Rs.) be denoted by 


the variable y. 
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CALCULATIONS FOR CORRELATION COEFFICIENT 


x y dxex—&  dymy— dx* ау ахау 
=x—65 =y- 


39 47 —26 —19 676 361 494 
65 3 -1 0 169 0 
62 58 -3 — 8 9 64 24 
90 86 25 20 625 400 500 
82 62 17 -4 289 16 —68 
75 68 10 2 100 20 
—40 — 6 1600 36 240 

98 91 33 25 1089 625 825 
51 —29 —15 841 225 435 

78 84 13 18 169 324 234 


Ex=650 — Xye660 Edx=0 Zdy=0 Edx*=5398 уау ахау 
=2224 =2704 


Xx 650 5. _ху_660_ 
Fe do 65; y= 66 
dx=x—¥=x—65 ; dyzy—y-—y—66 
m бшу il 12704. — 

"T Viaaid = JUSSI 


2704 2704 
WIA00SISZ ^" 3464845] =0:7804 


Aliter, 
7 logr=log 2704— Mlog 5398--log 2224] 
=3.4320—4(3.7325-+-3.3472) 


=3.4329— LOT —3.4320—3.53985 


= —0.10785—1.89215 
=  r--Antilog(1.89215) 20.7802 
Hence there is a fairly high degree of positive correlation 
between expenditure on advertising and sales. We may, therefore, 


conclude that in general, sales have increased with an increase in 
the advertising expenses. 


Example 8.3. From the following table calculate the coefficient 
of correlation by Karl Pearson's method. 
Xx 6 2 10 4 8 
Т. 9 1 ? 8 7 
Arithemetic means of X and Y series are 6 and 8 respectively. 
(Delhi Uni. B.Com. 1980) 


$e 
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Solution.. First of all we shall find the missing value of Y. 
Let the missing value in Y-series be a. Then the mean y is given 
by: 


> 
у= Y _ 9+11+а+8+7 = 35ta PRA (given) 
n 5 5 
> 35+a=5X8=40 
E a=40—35=5 


CALCULATION OF CORRELATION COEFFICIENT 
XA EY. A oc» (x- xy (У-у)? (0-Х) (У-у) 


9 0 1 1 0 
2 3f rt 3 16 9 -12 
10- 75 4 27 16 9 SED 
4. 8л ш? 0 4 0 0 
БСТ 2 -1 4 1 =2 


XX SY XX—X) XY-y) ХХ Y-»» X-X) (ү 
im E Do erat АС? X( н ў) 


ga 2X 30 rote) шыу, A 
we have ¥= 5 6, y е 35-79 


Karl Pearson's correlation coefficient is given by : 


1 
_ Соу(х,у) _ n 20-7) (y—y) 


oxoy тесу аа 
“n 202—9). — z(y- yy 
iA Z(x—x) (учу) 
2(2— #)% Б(у—ў)* 


END WB nm 
"40x20 Уфф - 28280 
——0.91922— 0.92 


Example 8.4. Calcülate the coefficient of correlation between 
X and Y series from the following data : 


Series 
X Y 
No. of pairs of observations 15 15 
Arithmetic mean 25 18 
Standard Deviation 3.01 3.03 
Sum of Squares of deviations from mean 136 138 


Summation of product deviations of X and Y series from their 
respective arithmetic means —122. 

U.C.W.A. (Final) June 1981; Guru Nanak Dev. U. B.Com. 

1979 ; Delhi Uni. B.Com. (Hons.) 1976 ; B.Com. 1977, 73) 


ie 


432 Business Statictiee 
Solution. Here we are given : 
n=15, x=25, y—18, сх=3.01, oy=3.03 
Ua—x)'= 136, Z(y—y)—138, 
and Z(x—x) (y—y)—122. 


Karl Pearson's correlation Coefficient between X-series and 
Y-series is given by: 


p= Хбх) (у-у) _ 122 
noxoy ~ 15x3.01x303 


> log 7—2.0864— 1.1761 -0.4786--0.4814] 
=2.0864—2.1361=—0.0497— 1.9502 
=  r-Antilog (1.9502) =0.8917 


Remark. Here we see that some of the data are superfluous 
X, y, Z(x—x)», Z(y—yy. 
Ww. В : 
med may also compute the Correlation coefficient using the 


viz., 


.. Xe-30-D) 12 
"UU XXX» уур 4/136 x Ia 


logr-2.0864— 3 2.1335+42.1399 ] 


=2.0864—2.1367=—0.0503= 1.9497 
=  r-Antilog ( 1-9497) —0.8904 
If we use this formula, then the data relating to n, X, y, ox 
and oy is superfluous. 
Example 8.5. Calculate Karl Pearson's coefficient of correla- 
tion from the following data : 
(i) Sum of deviations of «=5 
(ii) Sum of deviations of y=4 
(ii) Sum of squares of deviations of х=40 
(iv) Sum of squares of deviations of y=50 
(v) Sum of product of deviations of x and y-32 
(wi) No. of pairs of observations =10 
(Delhi U. B.Com. 1983) 
Solution. Letthe variables U and V denote the deviations 
of X and Y from arbitrary points A and B respectively i.e., let 
U=X—A and V—Y—B 
Then we are given : 
Zu-5, Iu? =40, Zuy—32 
Zy-—4, Р 2y1—50, n=10 


— 
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Karl Pearson's Coefficient of correlation between X and Y is 
given by: (c. f.) Property II page 434. 


nZuy - (Zu) (Ху) 
М (па — (Xuy] (n3 — (y) 
10x32—5x4 
М (10x40—25) (10x 30—16) 
320—20 300 


0 ————— — 
V 375x486 V 182250 


300 
= 742691 —0.7027 


Тау == гыр == 


Example 8.6, The coefficient of correlation between two varia- 
bles X and 


Y is 0°48. The covariance is 36. The variance of X is 16; 
Find the standard deviation of Y. 


[Delhi Uni. В.А. (Econ. Hons.) 1983] 
Solution. We are given : 


72,—0.48, Cov. (X, Ү)=36, ox?=16 > ox=4 
We have : 


ri Cov. (X, Y) eee Cov. (X, Y) 
Oz.0y бл.гту 


= 36 Api ES =18.75 
O7 4x048 — 0.487 = 18: 


Example 8.7. A computer while calculating correlation coeffi- 
cient between two variables X and Ү from 25 Pairs of observations 
obtained the following results : | 

n=25, ®Х=125, ®Х?%=650, ЖҮ = 100, EY?—460,XXY —508 : 


It was, however, discovered at the time of checking that two 

airs of observations were not correctly copied. They were taken as 
(6 14) and (8,6) while the correct values were (8, 12) and (6, 8). 
Prove that the correct value of the correlation coefficient should be 
2/3. U.C.W.A. (Final) December 1977] 


Барс ®Х'=125—6—8+8-+-6=—125 
Correcetd ZY—100—14—6--12--8—100 
Corrected ZX?=650—62—8?+ 82+ 62=650 
Corrected ZY?=460—14?— 62+ 122+ 82-436 
Corrected ZXY—508 -6x14—8x64-8x 124+68=520 
Corrected value of r is given by : 


434 
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nXXY—(XXYZY) 


"= JEX EVAn- Gr 


25 x 520—125 x 100 


= A/[25 x 650— (125}]x [25x 436—(100)?] 
00 


13000—1251 


~ 'V/(16255— 15625) х (10900— 10000) 


500 500 2 


CAO 900 25x30 ^3 


8.4.1. Properties of Correlation Coefficient 
Property I. Limits for Correlation Coefficient. 


Pearsonian correlation coefficient can not exceed I numeri- 
cally. In other words it lies between —1 and +1. Symbolically, 
—1<г<1 ...(8.6) 


Remarks 1. This theorem provides us a check on our calcu- 
lations. If in any problem, the obtained value of r lies outside the 
TRS +1, this implies that there is some mistake in our calcu- 
lations. 


. 2. r=+1 implies perfect positive correlation between the 
variables and r= —1 implies perfect negative correlation between 
the variables. 


ui ад „у—в ..(8.7) 


where A, B,h and k are constants, h>0, k>0 ; then the correlation 
Coefficient between x and y is same as the correlation coefficient 
between и and v i.e., 


r(x, y=r(u, v) > Рау ru» ...(8.8) 


Remark. This is one of the very important properties of the 
Correlation coefficient and is extremely helpful in numerical compu- 
tation of r. We had already stated [c.f. Remark to $8.3] that 
formulae (8.3) and (8.5) become quite tedious to use in numerical 
problems if ¥ and/or y are in fractions or if x and y are large. [n 
Such cases we can conveniently change the origin and scale (if 
possible) in X or/and Y to get new variables U and V. and compute 
the correlation between U and V by the formula 
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Z(u—u) .Z(v—v) 
nZuy— (Xu)(Zv) ...(8.9) 


m" [nZ u — (Zu) ][nZv? — (Уу)? | 
Now using the property II, we finally get : 
Tey=Fuv. 


Property III. Two independent variables are uncorrelated but 
the converse is not true 


If x and y are independent variables then 
Рау 0). 


Converse. However, the converse of the theorem is not true 
i.e., uncorrelated variables need not necessarily be independent. As 
an illustration consider the following bivariate distribution. 


We have 


es nixy—(Zx)(Zy) 
"S mnXS-(Xsj у пуу: (Уу)? 
8x0—0x 60 


агу nXxi—(Xxygw nuy?—(ay)? 


=0, 


because zero divided by any finite quantity is zero. Hence in the 
above example the variables x and y are uncorrelated. But if we 
examine the data carefully we find that x aad y are not indepen- 
dent but are connected by the relation y=x?. The above example 
illustrates that uncorrelated variables need not be independent. 
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Remarks 1. One should not be confused with the words of 
uncorrelation and independence. ray=0 i.e., uncorrelation between 
the variables x and y simply implies the absence of any linear (strai- 
ght line) relationship between them. They may, however, be related 
їп some other form (other than straight line) e.g., quadratic (as we 
тае seen іп the above example), logarithmic or trigonometric 

orm. 

2. Some more properties of the correlation coefficient will 
be discussed in the next chapter on Regression Analysis. à 

Example 8.8. Calculate the co-efficient of correlation for the 
ages of liusband and wife : 

Age of Husband : 23, 27, 28, 29, 30, 31, 33, 35, 36, 39 
Age of Wife — : 18, 22, 23, 24, 25, 26, 28, 29, 30, 32 
(Sambalpur Uni., B.Com., 1983) 


Solution. 

CALCULATIONS FOR CORRELATION COEFFICIENT 

x » u=x—3] v=y—25 u? »* uy 
23 18 —8 —7 64 49 56 
27 22 —4 -3 16 9 12 
28 23 -3 —2 9 4 6 
29 24 E -1 4 1 2 
30 25 -1 0 1 0 0 
31 26 0 1 0 1 0 
33 28 2 3 4 9 6 
35 29 4 4 16 16 16 
36 30 5 5 25 25 25 
39 32 8 7 64 49 56 


Zx—31]  Zy—257 Zu—i Zy—7 245—203  Zy'—163 Suy—179 
SSS с=т аана е з 


, Karl Pearson’s correlation coefficient between U and V is 
given by 


V [nZu? — (Хи) [ау —(yy] 
10x179—1x7 


-„——————=—————_ 

v [10x 203— (1)7][10 x 163 —(7)?] 
1790—7 1783 

У (2030—1)(1630—49) ^ 4/2029X1581 


E ERE 783 
45.04x39:76 — 1790.79 
0.9956, 


Since Karl Pearson's correlation coefficient (r) is independent 
of change of origin, we get 


Tzy—Fuv—0.9956, 
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Example 8.9. Find ‘Karl Pearson's coefficient of correlation 
between sales and expenses of the following ten firms: 
Firm 1 2583 s AM; 6 Zr 9 10 
Sale in thou- 
sand units 50 50 55 60 $5 65 65 60 60 50 
Expenses in 
thousand 
rupees Т IS moe 75 13 147 7]3 eae 


[Punjab Uni. В.А. (Econ. Hons. П), April 1980 ; 
Guru Nanak Рау U. B.Com, II, April 1983 ; 


u=(x—65)/5 ; v=y—13. 
CALCULATIONS FOR CORRELATION C OEFFICIENT 


Firms & у u= 258 v=y—]3 ut у uy 
1 50 11 —3 —2 9 4 6 
2 50 13 —3 0 9 0 0 
3 55 14 —2 1 4 1 —2 
4 60 16 -1 3 1 9 -3 
5 65 16 0 3i 0 9 0 
6 65 15 0 2 0 4 0 
7 65 15 2 0 4 0 
8 60 14 —1 1 1 1 ES 
9 60 13 —1 0 1 0 0 

10 50 13 —3 0 9 0 0 


a MEAE Me Se 


Zx—580 Ху=140 Iru——14 sy=10 Iu3—34 2y!'—32 Xuy—0 


Karl Pearson's correlation coefficient between u and v is 


given by 
nZuv—(Zu)(Zv) 


rw Vinu Su] nD (SFY 


10x 0—(—14) x (10) 


C VTIOx 34 —(-14)] XTI0x 32—(10)3] 


140 


7 830—196) x (520—100) 
140 


140 


17 144x220 = 1680 
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1306 
177.99 
Since correlation coefficient is independent of change of origin 
and scale, we finally have 
Tzy=Tue=0-7866. 


—0-7866. 


Aliter. We have : 


Ух _ 580 
m vq ues 
890275 140. 
y n- 0 x 


Since x and y are integers, it will be convenient to compute 
r by taking the deviations from means directly, i.e., by taking : 


dx—x—x-x-—58; dy=y—y=y—14, 
CALCULATIONS FOR CORRELATION COEFFICIENT 


Racor 


1 


NONN 


2) 
1 
0 
4 
4 1 
1 
1 
0 
1 
1 


Zx—580  Zy—140  xdx—0 Zdy—0  Zdx3—360 Zdy1—22 Xdxdy-70 


АТЫЫ ESO 7 
Хах. dy? ^ у360х22 “V 7950 
70 
587997 =0.7866 


Example 8.10. Find Karl Pearson's coefficient of correlation 
between the age and the. playing habit of the people from the following 
information : 


ü 
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Age group (years) No. of No. of 

people players 
15 and less than 20 200 150 
20 and less than 25 270 162 
25 and less than 30 340 170 
30 and less than 35 360 180 
35 and less than 40 400 180 
40 and less than 45 300 120 


Also mention what does your calculated ‘r’ indicate. 
[C.A. (Intermediate), November 1983] 


Solution. We want to find Karl Pearson’s correlation coeffi- 
cient between the age and the playing habit of the people. To do 
this, we first express the number of players in each age group on à 
common base i.e. we find the number of players out of a fixed 
number of persons (a common base) which may be taken as 100 or 
1000 or some other convenient figure. Here we express the number 
of players as a percentage of the total people in each age group. 


Now we compute Karl Pearson's correlation coefficient bet- 
ween age (x) and the percentage of players in each age group (y). 


Age group (yrs.) No. of No. of. Percentage of players (Y) 
people players 
150 
15—20 200 150 200 х 100—75 
16232 4 
20—25 270 162 570 х 100—60 
по 8 
25—30 340 170 340 * 100=50 
30—35 360 180 dg 100-50 
arc det 
35—40 400 180 200 x 100=45 
120 
40—45 300 120 300 x 100=40 
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CALCULATIONS FOR CORRELATION COEFFICIENT 


Age-group  Mid-value у ue ECTS, z v u? » w 
(x) 

15—20 175 75 =2 5 4 25 19 

20—25 225 60 -1 2 1 4 —2 

25—30 275 50 0 0 0 0 0 

30—35 32:5 22050 1 0 1 0 0 

35—40 375 45 2 -1 4 1 -2 

40—45 425 40 3 —2 9 4 —6 

Total IXu-3 Iv—4 — Zs—19 v= . xuv- 
34 —20 


Since correlation coefficient is independent of change of origin 
and scale we have : 


nXuv —(Zu)(Zv) 


Fay == ———— 
V DZu* — (Zu)*] . [nv —( Уу)?] 


6x(—20)—(3)x (4) 


~ [6x 19— y] [6X 34— ay] 


2 —120—12 —132 

o cune 8 2 
V/(014—9)x204—16) 4/105 x 188 
—132 —132 


= 719740 - 140499] = —0-9395 


Thus we conclude that there is a very hi i 
1 gh degree of negative 
de aoa (almost perfect negative correlation) between dus (x) 
and playing habit (y). „This implies that with advancement in age, 


‚842. Assumptions Underlying Karl Pearson’s Correlation 
Coefficient. Pearsonian correlation coefficient ris based on the 
following assumptions : 


(i) The variables X and Y under study are linearly related. 
z other words, the scatter diagram of the data will give a straight 
ine curve. 


jagaa 


w 
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(ii) Each of the variables (series) is being affected by a large 
number of independent contributory causes of sucha nature as to 
produce normal distribution. For example the variables (series) 
relating to ages, heights, weights supply, price etc., conform to this 
assumption. In the words of Karl Pearson : 


“The sizes of the complex of organs (something measurable) are 
determined by a great variety of independent contributing causes, for 
example, climate, nourishment, physical training and innumerable 
other causes which cannot be individually observed or their effects 
measured” Karl Pearson further observes, ‘‘The variations in inten- 
sity of the contributory causes are small as compared with their abso- 
lute intensity and these variations follow the normal law of distribu- 
tion." 


(iii) The forces so operating on each of the variable series are 
not independent of each other but are related in a causal fashion. 
In other words, cause and effect relationship exists between different 
forces operating on the items of the two variable series. These 
forces must be common to both the series. Ifthe operating forces 
are entirely independent of each other and not related in any 
fashion, then there can not be any correlation between the variables 
under study. 


For example the correlation coefficient between : 


(a) the series of heights and income of individuals over a 
period of time, 


(b) the series of marriage rate and the rate of agricultural 
production in a country over a period of time, 


(c) the series relating to the size of the shoe and intelligence 
of a group of individuals, 


should be zero, since the forces affecting the two variable series in 
each of the above cases are entirely independent of each other. 


However, if in any of the above cases the value of r for a 
given set of data is not zero, then such eorrelation is termed as 
chance correlation or spurious or non-sense correlation. 


[Also see § 8.1.3] 
8.4.3. Interpretation of r. The following general points may 


be borne in mind while interpreting an observed value of corre- 
lation coefficient г : 
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(i) r=+1 implies that there is perfect positive correlation 
between the variables. In other words, the scatter diagram will be 
a straight line starting from left bottom and rising upwards to the 
right top as shown in figure 8.1, § 8.3. 


(ii) If r=—1, there is perfect negative correlation between 
the variables. In this case scatter diagram will again be a straight 
line as shown in figure 8.1, § 8.3. 


(iii) If r=0, the variables are uncorrelated. In other words, 
there is no linear (straight line) relationship between the variables. 
However, r—0 does not imply that the variables are independent 
Ic.f. Property III page 529.]. 


(iv) For other values of r lying between --1 and —1, there are 
No set guidelines for its interpretation. The maximum we can con- 
clude is that nearer is the value of r to 1, the closer is the relation 
between the variables and nearer is the value of rto 0, the less 
close is the relationship between them. One should be very careful 
in interpreting the value of r as it is often mis-interpreted. 


(v) The reliability or the significance of the value of the corre- 
lation coefficient depends on a number of factors, One of the ways 
of testing the significance of r is finding its probable error [c.f. §8.5], 
which.in addition to the value of r takes into account the size of 
the sample also. A Tigorous test is given by Student's t-test in 
sampling theory. 


(vi) Another more useful measure for interpreting the value of 
r is the coefficient of determination [c.f. $ 8.9]. It is observed there 
that the closeness of the relationship between two variables is not pro- 
portional to r, 


. Tfr is the observed correlation coefficient in a sample ofn 
pairs of observations then its standard error, usually denoted by 
S.E.(r) is given by : 


—p2 
S. E(r) =! = 


...(8.10) 
Probable error of the correlation coefficient is given by: 
P.E. (r)--0.6745x S.E. (r) ; 
= 0.671451 (8.11) 
Wn Ж 


, Reason for taking the factor 0.6745 is that in a normal distri- 
bation 50% of the observations lie in the range 4-0-6745 с, where 
P is the mean and c is the s.d. 
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According to Secrist “The probable error of the correlation co- 

efficient is an amount which if added to and subtracted from the mean 

correlation coefficient, produces amounts within which the chances are 

even that a coefficient of correlation from a series selected at random 

will fall.” 

Uses of Probable Error 

1. The probable error of correlation coefficient may be used 
to determine the limits within which the population correlation co- 
efficient may be expected to lie. 

Limits for population correlation coefficient are 

r+P.E. (r) ...(8.12) 

This implies that if we take another random sample of the 
same size n from the same population from which the first sample 
was taken, then the observed value of the correlation coefficient, 
say, ғ іп the second sample can be expected to lie within the 
limits given in (8-12). 

2. Р.Е. (r) may be used to test if an observed value of sample 
correlation coefficient is significant of any correlation in the popu- 
lation. The following guide-lines may be used : 

(i) If r<P.E. (r) i.e. if the observed value of r is less than its 
P.E., then correlation is not at all significant. 

(ii) If r>6 P.E. (r) i.e., if observed value of ris greater than 
6 times its P.E. then r is definitely significant. 

(iii) In other situations, nothing ean be concluded with 
certainty. 

Important Remarks 1. Sometimes, P.E. may lead to 
fallacious conclusions particularily when л, the number of pairs of 
observations, is small. In order to use Р.Е. effiectively, n should 


be fairly large. However, a rigorous test for testing the significance 
of an observed sample correlation coefficient is provided by 


Student's r-test. 
2. P.E. can be used only under the following conditions : 


(i) The data must have been drawn from a normal popula- 
tion. 

(ii) The conditions of random sampling should prevail in 
selecting sampled observations. 

Example 8.11. Calculate coefficient of correlation between X 
and Y series from the following data and calculate its Probable 
Error also :— 

X 78 89 96 69 59 79 68 61 

TES ЗГ oO 1122 OZ. 136. — 123. "108 

(Take 69 as working mean for X and 112 for that for Y). 5 

(Punjab О. В. Сот. II, Sept. 1981) 
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Solution. Taking 69 and 112 as working means for X and Y 
Series respectively, let us take : 
u=x—69, у=у—112 
CALCULATIONS FOR CORRELATION COEFFICIENT 


x y u=x—69 v—y—112 и? y uy 
a ee ы eee 
78 125 9 13 81 169 117 
89 137 20 25 400 6: Ы 
96 156 27 44 729 1936 1188 
69 112 0 0 0 
59 107 —10 —5 100 25 50 
79 136 10 24 100 576 240 
68 123 = 11 1 121 -11 
61 108 =e —4 64 16 32 
Total Zu—47 2v=108 41-1475 2y:—3468 Хцу= 
2116 


р nXuy—(Xu)(Xv) 
um А) 
У nXut—(Xuy/ nZyi— (Уу)? 
8х2116—47х 108 


NU eel LA CM 
M 8x 1475— (47)? . 4/8x 3468 —(108)* 


16928 —5076 
- V11800—2209 x 4/27744 — 11664 
me 11852 11852 
© М/9591х 4/16080 = 97.9337 x 126-806 
11852 


7CUpMIgge7] -09544 


. , Since correlation coefficient is independent of the change of 
origin, we have 


lay" 'us—( 9544 
Probable Error P. E. (r) 


ot 220805 1— 0.9109 
P.E. (r)=0.6745 x S =0.6745 x NX SU 


v 
0.6745x0.0891 0.0601 
777728288. = 23284 =0:0212 


Marks in Maths. : 45 70 65 30 90 40 50 75 85 60 
» Statistics : 35 90 70 40 95 40 60 80 80 50 


Also calculate its probable error. Assume 60 and 65 as working 
means. (Delhi Uni. B. Com., 1970) 


ЧЫР aac ee a, 


la ay P d^ 
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(b) Hence discuss if the value of r is significant or not. Also 
compute the limits within which the population correlation coefficient 
may be expected to lie. 

Solution. (a) Let the marks in mathematics be denoted by 
the variable X and the marks in statistics by the variable Y. It may 
be noted that we can take out the factor 5 common in each of the 
X and Y series. Hence it will be convenient to change the scale 
also. Taking 60 and 65 as working means for X and Y series res- 
pectively, let us take : 


x— 60 _ y—65 
PL 5 апа Усы 


CALCULATIONS FOR CORRELATION COEFFICIENT 


x y и y ut y? uy 
45 35 —3 —6 9 36 18 
70 90 2 5 4 25 10 
65 70 1 1 1 1 1 
30 40 —6 —5 36 25 30 
90 95 6 6 36 36 36 
40 40 —4 —5 1 25 20 
50 60 —2 —I 4 1 2 
75 80 3 3 9 9 9 
85 80 5 af 25 9 15 
60 50 0 —3 0 9 0 
Totals : 2 —2 140 176 141 
aaa 
We have : 


n&uy— (Xu)(Zv) 


"= ni — y] x na E 


10x 141—2x (—2) 1414 


=А/(10х 140—4) (10x176—4) 4139651756 
> log r=log 1414—}[log 1396--1og 1756] 
773.1504 — 3[3.1449 4- 3.2445[ —3.1504— 3 x 6.3894 
=3.1504—3.1947=—0.0443=1.9557 
> r=Antilog (1.9557)—0.9031—0.9 
4 Tay=ruv0.9 
Probable Error of Correlation Coefficient is given by : 


=r? 0.6745x0.19 0.128155 _ 
РЕ (r)=0.6745 TE -—— VW aes 70-0405 


(b) Significance of r. We have 
г=0:9 and 6 P.E. (r)=6X 0.0405 —0.2430 
. _ Since r is much greater than 6 P.E. (r), the value of r is highly 
significant. z 
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Remark. Since the value ofr is significant, it implies that 
ordinarily, higher the marks of a candidate in Mathematics, higher 
is his score in Statistics also and lower the marks of a candidate in 
Mathematics, lower is his score in Statistics also. However, it does 
not mean that all the students who are good in Mathematics are 
also good in Statistics and all those students who are poor in 
Mathematics are also poor in Statistics. It should be clearly borne 
in mind that “the co-efficient of correlation expresses the relationship 
between two series and not between the individual items of the series” 


Limits for Population Correlation Coefficient are : 
"Р.Е. (7) =0.9031--0.0405 i.e., 0.8626 and 0.9436 


This implies that if we take another sample of size 10 from 
the same population, then its correlation cocfficient can be expected 
to lie between 0.8626 and 0.9436. 


Example 8.13, Test the significance of correlation for the 
following values based on the humber of observations (i) I0, and 
(її) 100, r=+.4 апа+.9. > (Rajasthan B. Com. 1970) 


. , Solution. We know that an observed value of r is definitely 
Significant if 


r>6 P.E. (r) > DAS 


In this case, we have : 


No.ofob- r P.E. 
servat ions 


1-04} 
16745 — À CPP. ...1g 
vio 


Significant Net 
Significant 


6745 1-C9* _ 6 Significant 
Vio 


1—(:9)* 1 a 
6745 =" Highly signi- 
М10 M ficant 


а 1-C9 
6745 ——V 77 у. ieni 
TATUS 0:128 Significant 
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EXERCISE 8.2 


1, Explain the meaning and significance of the concept of correlation. 
How will you calculate it from statistical Point of view. 


2. (а) Define Karl Pearson's coefficient of correlation. What is it 
intended to measure ? 

(6) What are the special characteristics of Karl Pearson’s coefficient of 
correlation ? What are the underlying assumptions on which this formula 
is based ? 

(c) How do you interpret a calculated value of Karl Pearson’s coeffi- 
cient of correlation ? Discuss in particular the values of r—9, r=—1 and 
Tel. 

3. State, giving reasons, whether the following statements are true or 
false. ` 
(a) Coefficient of correlation between two variables must be in the 
same units as the original data. 

(Delhi U. B.Com. 1983) 


(b) The correlation coefficient between rainfall and wheat yield per 
hectare was found tobe 0'8. Hence more rainfall means more agricultural 


production. 
[Osmania U. B.Com (Hons.) April 1983] 


4. Discuss the statistical validity of the following statements : 


(a) “High positive coefficient of correlation between increase 
in the sale of newspapers and increase of the number of crimes leads to the 
conclusion that newspaper reading may be responsible for the increase in the 
number of сгїтез.”” Д 


, 5) “A high positive value of r between the increase in cigarette smok- 
ing and increase in lung cancer establishes that cigarette smoking is responsible 
for lung cancer,” 


5. Calculate Karl Pearson's co-efficient of correlation from the follow- 
ing data : 


X: 6 8 12 15 18 20 24 28 31 
Ks 10 12 15 15 18 25 22 26 28 
(Delhi U. B.Com. IIT, 1984) 
Ans, rey=0.9587 
6. Find the correlation between x and у Series ; 


x-series 17 18 19! —197« 20.720] 291 21 22. 93 


y-series 12 16 14 11 15 19 22 16 155790 
(Guru Nanak Dev. Uni. B.Com. П, April 1982) 
Ans. rgy--0:6149 
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7. Calculate Karl Pearson’s coefficient of correlation between x and y 
from the following data : 


x-series: 80 60 51 69 58 62 64 72 56 58 
yeseries: 45 71 60 57 62 58 48 50 62 69 

[Punjab U. B.A. (Econ, Hons.) 1982) 
Ans. try=—0°7199 


8. Making use of the data given below, calculate the coefficient of 
correlation ris 


Case : Actu RC a: mg mg 5g 
Xo: soc icu ds n 9 
das Cp qus S e ЕН а o4 


(Dethi О. В, Сот. 1982) 
Ans. ri4—0:8958 
3 9° Calculate product moment coefficient of correlation for the follow- 
ing data of sales (x) and expenses (») in lakhs of rupees of 10 firms. 
x 46:133. 41:5738.5.36 745-34 2237: 50 — 40 
y. 12 13 24 16 15 14 21 17 19 19 
(Bombay U. B.Com, April 1983) 


Ans. roy —0:0213 


10, Compute Karl Pearson's coefficient of correlation in the following 
series relating to cost of living and wages : 


Wages 
(Rs.) : 100 101 103 102 100 99 97 98 96 95 


Cost of 
living : 98 99 99 97 95 92 95 94 90 91 


[Delhi U. B.Com. (External) 1982 ; Bangalore Uni. B. Com., Nov. 1981; 


Gujarat Univ. B.Com., Oct. 1980] 
Ans. re0:8472 


11. From the following data, find out the correlation coefficient bet- 
ween heights of fathers and sons. 


Height of fathers 
in inches жару Sa 66 67 67 68 69 70 72 
Height of sons А 
in inches 2367 OS 850 68 72.72 69 71 
[Osmania U. B.Com. (Hons.) April 1983] 
Ans, r=0'603 


12, Calculate the Karl Pearson’s ‘coefficient of correlation for the 
following ages of husbands and wives at the time of their marriage : 


| 
| 


d 
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Age of husband (in years) ; 23 27.28 28. 28 ›30°, 30.334 23:938. 

Age of wife (in years) «dg 9129/5:22- -27--: 21: 7729 21" 129 a cee 
(Kurukshetra Univ. B. Com. Sept. 1980 ; Delhi Univ. B. Com. (Hons.) 1980] 
Ans, r=0°8013 


. 13, Find Karl Pearson’s correlation coefficient between age and playing 
habit of the following students : 


Age н 15 16 17 18 19 20 
No. of students : 250 200 150 120 100 80 
Regular players : 200 150 90 48 30 16 


[Delhi U. B. Com. (Hons.) 1973 ; Kerala U B. Com. April 1977] 
Hint. Find r between Age (X) and percentage of regular players (Y). 
Ans. Pey=0°9897 
14, The following table gives the distribution of the total population and 
those who are totally or partially blind among them. Find out if there is any 
relation between age and blindness. 


Age (Years) 0—10 10—20 20—30 30—40 40—50 50—60 60—70 70—80 


No. of 
Persons (000) 100 60 40 36 24 il 6 3 
Blind 55 40 40 40 36 22 18 15 


[Guru Nanak Dey Uni. B. Com. 1979 ; Rajasthan Uni. B. Com. Oct. 1980] 


Hint. Here we shall find the correlation coefficient between age (X) and 
the number of blinds per lak h (Y) as given in the following table. 


x 5 15 250100035 24145 55 65 75 


Y 55 67 100 111 150 200 300 500 
Ans. r=0°8982 


15, From the following table find out if there is any correlation between 
ege and blindness : 


Age No. of Persons Blind 
(in thousands) (Number) 

10—20 100 50 
20—30 60 40 
30—40 40 40 
4—50 . 36 40 
50—60 24 30 
60—70 20 
70—80 6 10 
80—90 3 10 


(Kurukshetra U. B. Com. II. Sept. 1982) 
Ans. 0:8946 
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. 16. A computer while calculating the correlation coefficient between the 
variables X and Y obtained the following results : 


N=30, EX=120, ZX1—600, ZY—90, 2Y?—250, XY—356 
H was, however, later discovered at the time of checking that it had 
copied down two pairs of observation as : 


— — 


ХІҮ, : X|Y 
EJEN while the correct values were, ip 115 
12] 7 10| 8 


Obtain the correct value of the correlation coefficient between X and Y. 
(Madurai Univ. B. Com., April 1978) 
Ans. r=0°149 


17. Coefficient of correlation between X and Y for 20 items is 0'3 ; mean 
of X is 15 and that of Y 20, standard deviations are 4and 5 respectively. At 
the time of calculations one item 27 was wrongly taken as 17 in case of X series 
ad 35 instead of 30 in case of Y series. Find the correct coefficient of correla- 
ion. 


Ans. Correct value of correlation coefficient —0*504. 


18. Inorder to find the correlation coefficient between two variables X 
and Y from 12 pairs of observations, the following calculations were made : 


IX=30, ZY—5, 2X*—670, EY!—285, ZYY—334 
On AL verification it was found that the pair (X—11, Y=4) was 


copied wrongly, the correct value being (¥=10, Y—14). Find the correct value 
of correlation coefficient. 


Ans, 0°78. 


. 19. What do you understand by the probable error of correlation co- 
efficient ? Explain how it can be used to : Y 


@ Interpret the significance of an observed value of sample correlation 
coefficient. 


(ii) Determine the limits for the population correlation coefficient . 


20. Calculate the co-efficient of correlation and find its probable error 
from the following data : 
? 7 6 5 4 3 2 1 
ME 18 16 14 12 10 6 8 
(Kurukshetra U. B. Com. II, April 1982) 
Ans. rgy—0:9643 ; P.E. (r)  0:0179. 
21. Calculate the coefficient of correlation and probable error from the 
following data : 
X 3 1 2 3 4 5 6 77 98 9 10 
Tur 20 16 14 10 40 59-38 77-56 5 
(Punjab Uni. В. Com. И, Sept. 1982) 
AnS, rgy— —0:95 ; Р.Е. (r)=0°0208 
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...72. Calculate the coefficient of Correlation (r) between the marks in 
Statistics (x) and Accountancy (у) of 12 students from the following data : 


X: 52. 74: 93. 55° 4р 23 92 64 400 71 33 т 
Y: 45 8 63 60 35 40 700 58 433 644 51 75 


Also determine the probable error of г. 


(Punjab Uni. B.Com. 1981) 
Ans. rey=0°7885 ; P.E.(r)-0:0737, 


. 23. Calculate Karl Pearson's coefficient of correlation for the following 
series. 
Price (in Rs.) 110—111 111—112 112—113 113—114 114—115 115—116 
Demand (in kg.) 600 640 640 680 700 780 
Price (in Rs.) 116—117 117—118 118—119 
Demand (in kg.) 830 900 1,000 
Also calculate the probable error of the correlation coefficient, From 
your result can you assert that the demand is correlated with price ? 
(Delhi Uni. B:Com, 1976) 
Ans. r=0°9651, P.E.(r)=0:0154, 


24, A student calculates the value of r as 0:7 when the number of 
items (л) in the sample is 25. Find the limits within which r lies for another 
sample from the same universe, (Mysore Uni. B.Com. 1972) 


Ans. Required limits are 0:767 and 0:633. 
25. In a correlation study the results were 
Zxy—40, N=100,  Ix:—80, ху-—20 
The correlation coefficient is 
(a) +10 (Б) —10 (c) zero (d) None of these. 
Here x2X—X, у=ү—7, 
х [С. А. (Intermediate) May 1983] 
Ans. (а) 
26. Given ғ= 0:8; Exy=60, су=2:5 and Ex*=90, 
Find the number of items, Here x and y are deviations from respective 


Im [C.A. (Intermediate) May 1982) 
Ans. n-10. 
27. From the following data calculate the coefficient of correlation 


between two variables X and Y: 
(i) Number of items in X-series or Y-series=12, 


(ii) Sum of the squares of deviation from mean: 360 ог X-series and 
250 for Y-series. 


(ii) Sum of the product of deviations of the two series from their res- 


pective means=225. 
U.C. W.4A. (Final), June 1983) 
Ans. reg=0'75 
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28. The coefficient of correlation between two variables Y and Y is 0:38, 
Their covariance is 102. The variance of X is 16. Find the standard deviation 
of ¥-series, 


Ans. oy=6°71. 
29, (а) The corresponding values of the variables are given belows : 


Ж; 2 3 5 8 9 
Iv. 4 6 10 16 18 
„e The correlation between the variables is: —1, 0, 1 or none of these, 
Justify your answer. [C.A. (Intermediate) May 1982] 
ANS. ry — 41 


(6) Ате the following statements valid ? Justify your answer : 


(i) Positive value of correlation coefficient between х and y implies 
that if x decreases, y tends to increase. 


(ii) Correlation coefficient is independent of the origin of reference but 
is dependent on the units of measurement. 


(iii) Correlation coefficient between x and y turned out to be 1°02. 
U.C.W. A. (Final), Dec. 1980] 
Ans. (i) False, (ii) False, (iii) Impossible. 


30 Comment on the following, giving reasons for your conclusions : 


tive, re? If the correlation coefficient between two variables X and Y is posi- 


(i) the Correlation coefficient between — Y and —Y is positive. 
positive the correlation coefficient between X and —Y or —X and Y i$ 


(5) The correlation Coefficient between two variables is 1-4. 
(© Ifthe variables are independent then they are uncorrelated. 


А : i re 
measured in The ation coefficient can be calculated only if the two variables а 


Variables are сотейабоп Coefficient between two variables is zero, then the 


Cf) The value of r cannot be negative. 
(D r measures ‘very type of relationship between the two variables. 


tor» "The Closeness of relationship between two variables is proportional | 


Pe туз... 


| 
4 
1 
3 
) 


we 2 


Correlation 453 


8'6. Correlation in Bivariate Frequency Table. If in a_biva- 
riate distribution the data are fairly large, they may be summarised 
in the form of a two-way table. Here for each variable, the values 
are grouped into various classes [not necessarily the same for both 
the variables], keeping in view the same considerations as in the 
case of univariate distribution. For example if there are m classes 
for the X-variable series and n classes for the Y-variable series then 
there will be т Хл cells in the two-way table. By going through 
the different pairs of the values (x, y) and using tally marks we can 
find the frequency for each cell and thus obtain the so-called bivari- 
ate frequency table as shown below 


BIVARIATE FREQUENCY TABLE 


Total of 
frequencies 
ofY 


Mid Points 


F(x, у) fy 


Total of. Total 
frequencies fe Lfo—Xf,Q-N 
of X 


Here f(x, y) is the frequency of the pair (x, y). 


< 


The formula for computing the correlation coefficient between 
X and Y for the bivariate frequency table is 


Le LLL NZofix, V-ESZ) at 
VINZXRA-QEXAYIXINZSA- (QS 0000042 


where N is the total frequency. If there is no confusion we may 
use the formula : 


Муху (Уух) (Хуу) 
ий, JUS а (813 
CH узра EIAN E GI oe 
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where the frequency f used for the product xy is‘nothing but 
/ (x,y) and the frequency f used in the sums Efx and Ууу are respect- 
ively the frequencies of x and Y, viz., fe & fy as explained in the 
above table. If we change the origin and scale in X and Y by trans- 
forming them to the new variables U and V by 

J&B; 


sd and y —— 


where A and k are the widths of the x-classes and y-classes res- 
pectively and A and B are constants, then by property II ofr we 
have: ^ 


ramum Ef) ig 14) 
V ENZfut — (fu) x UN fe СУУУ) ] 


We shall explain the method by means of examples. 


Example 8 .14. Family income and its percentage spent on. food 
in the case of hundred families gave the following bivariate frequency 
distribution. Calculate the coefficient of correlation and interpret its 
value. 


Food Expenditure Family income (Rs.) 
(in %) 200—300 300—400 400—500 500—600 600—700 
10-15 — E = 3 2 
15—20 — 4 9 4 3 
20—25 7 6 12 5 — 
25—30 3. 10 19 8 — 


[Delhi Uni. M.B.A. 1981] 


Solution. Let us denote the income (in Rupees) by the vari- 
able X and the food expenditure (%) by the variable Y. 


. Steps: 1. Find the mid points of various classes for ¥ and Y 
series. 


2. Change the origin and scale in X-series and Y.series by 

transforming them to new variables и and у as defind below : 
"n: ZA х—450 
: 2 73000* 

and 8 = IM = 
where x denotes the mid-points of the X-series and y denotes the 
mid-points of the Y-series, and ^ and К аге the magnitudes of the 
classes of X and Y series respectively, 

3. For each class of X, find the total of cell frequencies of all 
the classes of Y and similarly for each class of Y find the total of 
cell frequencies of all the classes of Y. - 

4. Multiply the frequencies of x by the corresponding values 
of the variable и and find the sum Zfu. 
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5. Multiply the frequencies of y by the corresponding values 
of the variable y and find the sum Zfy. 


6. Multiply the frequency of each cell by the corresponding. 
values of uand v and write the product fX uX y within a square in 
the right hand top corner for each cell. For example for u——1 
and у=2, the cell frequency fis 10. Therefore, the product of f, 
u and y is (—1)x(2)x 10=—20 which is written within a Square on 
the right hand top of cell. Similarly for u=2 and v=1, the product 
fuy—0 X 2x 1—0, and so on for all the remaining cell frequencies. 


7. Add together all the figures in the top corner squares as 
obtained in step 6 to get the last column fuv for each of the X and 
Y series. Finally, find the total of the last column to get Zfuv. 


8. Multiply the values of fu and fv by the corresponding values 
of и and > respectively to get the columns for fu? and fyè. Add these 
values to obtain Уу? and Хуу. 


The above calculations are shown in the table on page 456 


as NXfuy— (fuv) 


м NZfut— (Xfuy TX ГАР apy] 
100x (—48)—0x 100 


ТУ (100x 120—0) x [100 x 200— (100]5] 


—4800 —4800 


= A/ 12000: (20000— 10000) — / 12000 x 10000 
—48 48 
=V X100 V 12000 
=—Antilog [ log (ста ] 
=—Antilog [log 48— 3 log 12000] 
=—Antilog [1.6812— 3 x 4.0792] 
==== —Antilog [1.6812— 2.0396] 
=—Antilog [— 0.3584]— — Antilog [1.6416] 
=—0 4381. 
Hence Tey =ruv= 0:4381 [By property II or r] 
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CALCULATION OF CORRELATION COEFFICIENT 
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^ Example 8.15. Calculate Karl Pearson's co-efficient of correla- 
tion from the data given below: 


Age in Years 
20 


19 


[Delhi Uni. B.Com. (Hons.) 1974] 


Solution. If we denote the age in years by the variable X and 
the mid-point of the class intervals of marks by the variable Y and 
take 


u-X—20; and y= BS ^ 
then the bivariate correlation table is as given on page458 
We have 
Миу (Zfu)(Zfv) 
O° VIN (SARS 
40x (—38)—9x6 


ii у [40x 47—(9)?] x [40x 50—(6)?] 


— 1520—54 — 1574 


= or к= a 
V (1880—81)(2000—36) 4717395: 1964 
; 1574 
=— Antilo, —=——— 
d [ log (Tes 55 


—-Antilog [log 1574—3 (log 1799--log 1964)] 
—-—Antilog [3.1970—4 (3.2551+3.2931)] 
=—Antilog [3.1970—3.2741]— —Antilog [—0.0771] 
——Antilog [1.9229]— —0.8373. 


. , But since correlation coefficient is independent of change of 
origin and scale [c.f. Property II of r], we get 


Tey —Fu»— —(.8373 


Lp n/a) 
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EXERCISE 8.3 


1. What is a bivariate table ? Write the formula you use for calculating 
coefficient of correlation from such a table, explaining the symbols used. What 
does a negative value of the coefficient of correlation indicate ? 


2. Calculate the coefficient of correlation between the ages of husbands 
and wives and its probable error from the following table ; 


Ages of husbands (years) 
Ages of wives 20-30 30—40 40—50 50-60 60—70 Total 


(years) 
ee ee 
15—25 5 9 3 — — 17 
25—35 — 10 25 2 — 37 
35—45 — 1 12 2 — 15 
45—55 — — 4 16 5 25 
55—65 — - — 4 2 6 
————MÀ——— Mà 
Total 5 20 44 24 7 100 

[Karnataka U. B.Com., April 1982 3 Dethi Uni. B.Com, (Hons.) 1972] 
Ans, r=0°823, 
3, Compute the Coefficient of correlation between dividends and prices 


of securities as given below : 


—————————— 


Annual Dividends (in Rs.) 
6—8 8—10 10—12 12—14 14—16 16—18 


, 


Security Prices 
(in Rs.) 


130 —140 
120—130 
110—120 
100—110 
90—100 
80—90 
70—80 


non! | | | 
‚сөюе ы] 
reer errr 
1 l Nóv 
бй RSS 
КАРД res 


[Guru Nanak Dev. Uni. B.Com. Sept. 1975) 
Ans, 0°71 


4. Calculate the Product moment coéfficient of correlation for the follow- 
ing bivariate distribution. 


(Bombay Uni. B.Com. May 1982) 


| 
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. _ 5. Following table gives ages of husbands and wives in a community. 
Find the correlation coefficient. 


Age of husbands in years 


Age of wives in years 


20—25 25—30 


PV 


[Osmania U. B.Com. (Hons.) April 1983] 


6. From the lata given below find Karl Pearson's coefficient of. correla- 


tion between the ages of husbands and wives : | 
Age of wives Age of husbands Total | 
(in years) (in years) 7] 
20—30 30—40 40—50 50—60 60—70 E 

55—65 4 2 6 

45—55 4 16 5 25 
35—45 1 12 2 15 ў 

25—35 10 25 2 37 

15—25 5 9 3 17 


Total 5 20 44 24 7 100 
[Punjab Uni. B.A, (Econ. Hons. II), 1981] 


LI EM 


7. Asample of 100 firms was taken and these were classified aacording 
to the sales executed by them and profits earned consequently. The results are 
Shown in the table given below. Determine the correlation between sales and 
profits and also the probable error. 


Profits Sales (in lakhs of Rs.) 
(in 000 Rs.) 7—8 8—9 9—10 10—11 11—12 12—13 Total 


EN aa a E E 


50—70 5 3 a = A Ug 

70—90 3 8 5 4 — — 20 

90—110 1 - 7: 11 2 2 23 

110—130 — 4 5 15 6 — 30 

130—150 — — 2 74 4 6 19 

Total 9 15 19 37 12 8 100 t 
(Punjab Univ. B. Com. April 1983) n 


йкы s 
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_ 8.7. Rank Correlation Method. Sometimes we come across 
statistical series in which the variables under consideration are not 
capable of quantitative measurement but can be arranged in serial 
order. This happens when we are dealing with qualitative charac- 
teristics (attributes) such as honesty, beauty, character, morality, 
etc., which cannot be measured quantitatively but can be arranged 
serially. In such situations Karl Pearson's coefficient of correlation 
cannot be used as such. Charles Edward Spearman, a British 
Psychologist, developed a formula in 1904 which consists in obtain- 
ing the correlation coefficient between the ranks of n individuals in 
the two attributes under study. 


Suppose we want to find if two characteristics A, say, intelli- 
gence and B, say, beauty are related or not. Both the characteristics 
are incapable of quantitative measurements but we can arrange a 
group of n individuals in order of merit (ranks) w.r.t. proficiency in 
the two characteristics. Let the random variables X and Y denote 
the ranks of the individuals in the characteristics A and B respecti- 
vely. If we assume that there is no tie, ie., if no two individuals 
get the same rank in a'characteristic then, obviously, X and Y as- 
sume numerical values ranging from 1 to n. 


The Pearsonian correlation coefficient between the ranks Ж 
and Y is called the rank correlation coefficient between the charac- 
teristics A and B for that group of individuals. 

Spearman's rank correlation coefficient, usually denoted by p 
(Rho) is given by the formula 


65а? 
р=1— яо 1) ...(8.15) 
where d is the difference between the pair of ranks of the same 
individual in the two characteristics and л is the number of pairs. 


8.7.1. Limits foro. Spearman's rank correlation coefficient 
lies between —1 and+1, i.e., 
—1&ex«l +. (8.16) 
Remark . Since the square ofa real quantity is always non- 
négative, i.e., 20, Ed? being the sum of non-negative quantities is 
also non-negative. Further since л is also positive we get from 
(815) 
p=1—[some non-negative quantity] 
> ex, 0, 
the sign of equality holds if and only if Ed*—0. Now, 2d?=0 if © 
and only if each d=0, i.e., the ranks of an individual are same їп 
both the characteristics. Following table gives one such possibility, 
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On the other hand p will be minimum if Eg? is maximum, i.e., 
if the deviations dare maximum which isso if the ranks of the 
individuals in the two characteristics are in the reverse (opposite) 
order as given in the following table. 


Individual 


8.7.2. Computation of Rank Correlation Coefficient. We shall | 


discuss below the method of computing the Spearman’s rank 
correlation coefficient p under the following situations : 


(i) When actual ranks are given. 
(ii) When ranks аге not given. | 
Case (i) When Actual Ranks are given : | 
In this situation the following steps are involved : | 
(i) Compute d, the difference of ranks. | 
(ii) Compute d? 
(iii) Obtain the sum xd? 
(iv) Use formula (8.15) to get the value of c. - 


Example 8.16. The ranks of the same 15 students in two 
subjects A and B are given below; the two numbers within the | 
brackets denoting the ranks of the same student in A and B respec- | 
tively. (1,10), (2,7), (3,2), (4,6), (5,4), (6,8), (7,3), (81), (9,11), 
(10,15), (11,9), (12,5), (13,14), (14,12), (15,13). 


Use Spearman’s formula to find the rank correlation coefficient. 


U.C.W.A. (Final) December 1977] 
Solution. 
CALCULATION OF SPEARMAN'S CORRELATION 
COEFFICIENT 
Rank in A Rank in B d—x—y d 
(х) 0) 
0 ~9 81 
2 7 25 25 
3 2 1 1 
4 6 22 4 
5 4 1 1 
6 8 —2 4 
7 3 4 16 | 
8 1 7 49 
9 11 24 4 | 
10 15 —5 25 
11 9 2 4 | 
12 5 7 49 | 
13 14 ag 1 
14 12 2 4 
15 13 2 4 H 
2d=0 1] 2015272 
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Spearman’s rank correlation coefficient р is given by: 


-pL-8EB. _ 60212 

е nmi) 15(225—1) 
6x272 172 4718 

Cit isn 72. 355 КЫК: 


Example 8.17. Ten competitors іп a beauty contest are ranked 
by three judges in the following order : 
Ist Judge: 1 6 5 10 3 2 4 
2nd-Judge: 3 5 8 4 77-10 2 $ 
3rd Judge: 6 4 9 8 1 2 3 


“aN 
NON 


Use the rank correlation coefficient to determine which pair of 
judges has the nearest approach to common tastes in beauty. 


Kurukshetra U. B:Com. 1980; Sept. 78; Guru Nanak Dev. U. 
l B.Com. 1981 ; Punjab О. B.Com. Sept. 1979] 
Solution. Let Ау, R and R, denote the ranks given by the 
first, second and third judges respectively and let р be the rank 
correlation coefficient between the ranks given by «th and sth judges, 
15©&]= 1,2,3. Let йз=В‹— Rs, be the difference of ranks of an indi- 
vidual given by the ith and jth judge. 


CALCULATION OF RANK CORRELATION COEFFICIENT 


dis 
=Rı—R; 


1 3 6 —2 —5 —3 4 25 9 
6 5 4 1 2 1 11, 44-2 
5 8 9 = —4 E 99. 166 — 1 
10 4 8 6 2 —4 36 4 16 
3 7 1 —4 2 6 16 4 36 
2 10 2 —8 0 8 64 0 64 
4 2 3 2 1 —1 44- 1 1 
9 1 10 8 —1 9 64 1 81 
7 6 5 1 2 1 Ia 4A. `1 
8 9 7 —1 1 2 11. APs 

idua- Zdu=0 Zdn=0 Idu? Id? Edy? 

Кешр = 200 =60 =214 

ee A A T ee eee АНА ЕЕ С 

We have n=10. 


Spearman’s rank correlation coefficients are given by : 
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x 65413: __ .6x20 7 d 
eami AGIT) 1-10x89 7 —35 7-092121 
бй у 6x60 07 
ба р) oxy 7] 706363 
-1-.SEdg A 6x2M 0 49 0 
ба-ла "у =1—15у= — 165 —— 02970 


Since o, is maximum, the pair of first and third judges has 
the nearest approach to common tastes in beauty, 


Remark. Since p,» and £: аге negative, the pair of judges (1,2) 
and (2,3) have opposite (divergent) tastes for beauty. 


Case (ii) When Ranks are Not Given. 


Example 8:18. Calculate Spearman's rank correlation co- 


efficient between advertisement cost and sales from the following 
lata : 


Advertisement 
cost (000 Rs.) 39 65 62 90 82 75 25 98 36 78 
Sales (lakhs) 47 53 58 86 62 68 60 91 51 84 


Solntion. Let X denote the advertisement cost (’000 Rs.) and 
Y denote the sales (lakhs). 


CALCULATION OF RANK CORRELATION COEFFICIENT 


X Y Rank of X Rank of Y d=x—y d* 
(x) o) 

ETS ы ексен ыссы ныкы NE A E 
39 47 8 10 —2 4 
65 53 6 8 —2 4 
62 58 T 7 0 0 
90 86 2 2 0 0 
82 62 3 5 2 4 
75 68 5) 4 1 1 
25 60 10 6 4 16 
98 91 1 1 0 0 
36 51 9 9 0 0 
78 84 4 3 1 1 


Н 
[ 
8 


NC 


y 
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Here n=10 
E бш _,_ 6x30 
n(n? —1) 10x99 
2 
Sl ee 
11 11 UE 


Repeated Ranks. In case of attributes if th i ie Że., i 
any two or more individuals are placed together d au Bec 
tion w.r.t. an attribute or if in case of variable data there is more 
than one item with the same value in either or both the Series then 
Spearman's formula (8.15) for calculating the rank correlation co- 
etlicient breaks down, since in this case the variables Y [the ranks 
of individuals in characteristic A (Ist series)] and Y [the ranks of 
individuals in characteristic B (2nd series)] do not take the values 
from 1 to n and consequently #y, while in proving (8.15) we had 
assumed that x—y. 

In this case common ranks are assigned to the repeated items. 
These common ranks are the arithmetic mean of the ranks which 
these items would have got if they were different from each other 
and the next item will get the rank next to the rank used in comput- 
ing the common rank. For example, suppose an item is repeated 
at rank 4, Then the common rank to be assigned to each item is 
(4+ 5)/2, i.e., 4.5 which is the average of 4 and 5, the ranks which 
these observations would have assumed if they were different. The 
next item will be assigned the rank 6. If an item is repeated thrice 
at rank 7, then the common rank to be assigned to each value will 
be (7+8+9)/3, i.e., 8 which is the arithmetic mean of 7, 8 and 9, 
viz., the ranks these observations would have got if they were differ- 
ent from each other. The next rank to be assigned will be 10. 

If only a small proportion of the ranks are tied, this technique 
may be applied together with formula (8.15). If a large proportion 
of ranks are tied, it is advisable to apply an adjustment or a correc- 
tion factor (C.F.) to (8.15) as explained below. 


“In the formula (8-15) add the factor m(m*— 1)112 to Zd? where 
m is the number of times an item is repeated. This correction factor 
is to be added for each repeated value in both the series." 


Example 8-19. A psychologist wanted to compare two methods 
A and B of teaching. He selected a random sample. of 22 students. 
He grouped them into 11 pairs so that the students in a pair have 
approximately equal scores on an intelligence test. In each pair one 
student was taught by method A and the other by method B and 
examined after the course. The marks obtained by them are tabulated 


below: 
Pair Оаро On WEST o9" Jo TE 
A ОООО ЗОО 227. 300 201.228, 0011 
В 3837 /352104%267-23 2710190 20.16 11 21 
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Find the rank order correlation coefficient. 
(Himachal Pradesh Uni. MBA 1973) 


Solution: Let variable X denote the scores of students taught 
by method A and Y denote the scores of students taught by 
method В. 


(ii) CALCULATION OF RANK CORRELATION 


COEFFICIENT 
eee 
x Y Rank of X RankofY | dex—y d* 

©) o) 
ee a ee ee ee ee 
24 37 6 1 5 25 
29 35 3 2 1 1 
19 16 8-5 9:5 —1 1 
14 26 10 4 6 36 
30 23 rs 5 —35 12:25 
19 27 8:5 3 55 30:25 
27 19 5 8 —3 9:00 
30 20 15 7 —5°5 30:25 
20 16 7 9:5 —2:5 6:25 
28 п 4 11 -7 49-00 
HI 21 1 6 5 25°00 


SS ы ы ыы EE E са Ыл аку. 1 
54=0 2d*=225-00 


In the X+series, we see that the value 30 occurs twice. The 
common rank assigned to each of these values is 1'5, the arithmetic 
mean ofland2,the ranks which these observations would have 
taken if they were different. The next value 29 gets the next тапк; 
viz., 3. Again, the value 19 occurs twice. The common rank assign- 
ed to it is 8.5, the arithmetic mean of 8 and 9 and. the next value, 
viz., 14 gets the rank 10. Similarly, in the y-series the value 16 oc. 
curs twice and the common rank assigned to each is 9:5, the 
eae mean of 9 and 10. The next value, viz., 11 gets the rank 


Hence we see that in the X-series the items 19 and 30 are 
repeated, each occurring twice and in the Y-series the item 16 is 
repeated. Thus in each of the three cases m-2. Hence on apply- 
2 К LM 
ing the correction factor um 1) for each repeated item we get, 


dz qu, 20—10) | 20-3) | 24—1) 


pi 12 12 12 
1I(121—1) 

Ta 6х226:5 _ 

-1— 1IX120. =1—1.0225=— 0.0225. 
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_ Example 8.20. The coefficient of rank correlation of the marks 
obtained by 10 students in two particular subjects was found to be 0.5. 
It was later discovered that the difference in ranks in two subjects 
obtained by one of the students was wrongly taken as 3 instead of 7. 
What should be the correct value of coefficient of rank correlation ? 


[Osmania U. B. Сот. Ш Oct. 1983 ; 
ICW.A. (Final) June 1981] 


Solution. Weare given n=10, р=0.5. Using (8-15) we 
get 
^ 62d? 62d? 
05—-1— 63-1) | 11099 
62d? 
> GT О 
990 
з— 82. 
> Zd 6x2 82:5 


Since one difference was wrongly taken as 3 instead of 7, 
the correct value of Zd? is given by : 
Corrected 242—82.5—324- 71582.5—9--49—122.5 
T _,_ 6x125 | 49 
<. Corrected e=1 -10x99 SA EEG 
= 1 —0:7424=0:2576 
$.7.3. Remarks on Spearman’s Rank Correlation Coefficient. 
1. We always have Zd—0, which provides a check for 
numerical calculations. 


2. Since Spearma 
but Pearsonian correlation coe 
interpreted in the same way as the Kar! 
efficient. 

3. Karl Pearson's correlatio 


parent population from which sample 
normal. If this assumption is violated 


ps rank correlation coefficient o is nothing 
fficient between the ranks, it can be 
| Pearson's correlation co- 


n coefficient assumes that the 
observations are drawn 18 


nce no strict assumptions are made about the 
sample observations are 
drawn. 

4. Spearman's formula iseasy to understand and apply as 
compared with Karl Pearson’s formula, The values obtained by 
the two formulae, viz., Pearsonian г and Spearman's e are generally 
different. The difference arises due to the fact that when ranking 
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is used instead of full set of observations, there is always some loss 
of information. Unless many ties exist, the coefficient of rank 
correlation should be only slightly lower than the Pearsonian 
coefficient. 


5. Spearman’s formula is the only formula to be used for 
finding correlation coefficient if we are dealing with qualitative 
characteristics which cannot be measured quantitatively but can be 
arranged serially. It can also be used where actual data are given. 
In case of extreme observations, Spearman’s formula is preferred to 
Pearson’s formula. : 


6. Spearman’s formula has its limitations also. It is not 
practicable in the case of bivariate frequency distribution (Correla- 
tion Table). For n>30, this formula should not be used unless the 
ranks are given, since in the contrary case the calculations are quite 
time consuming. 


EXERCISE 8.4 


1. (а) What is Spearman’s rank correlation coefficient ? Discuss its 

usefulness 

(b) Explain the difference between Karl Pearson's (product moment) 
correlation coefficient and rank correlation coefficient. 

2. (a) What are the advantages of Spearman's rank correlation coeffi- 
cient over Karl Pearson's Correlation coefficient ? Explain the method of 
calculating Spearman's correlation coefficient. 

[Shivaji Uni. В. Com., April 1981] 


(b) Define rank correlation coefficient. When is it preferred to Karl 


Pearson's coefficient of correlation ? (Delhi U. B. Com. (Hons.) II, 1984] 
(с) What do you understand by rank correlation ? How isit determi- 
ned ? ІС. А. (Intermediate), May 1982} 


3. The ranks of same 16 students in Mathematics an i ге 
„Sar d Physics are as 
follows. Two numbers within brackets d ote the rank: - 
" Physics. ckets denote the ranks of the students of Mathe. 


(1, 1) Q, 10) (3, 3) (4, 4) (5, 5) (6,7) (7, 2) (8, 6) (9, 8) (10, 11) (11, 15 
(12, 9) (13, 14) (14, 12) (15, 16) (16, 13). ROTE COMER) 


i Calculate the rank correlation Cot fficient fo ienci i 
incite ed PEE г proficjencies of this group 


Ans, р=0.8 
4. Two judges in a beauty competition rank the 12 entries as 
follows : 


а Е ВАЧ, Se дй. ih LE 12 
eae a о 6 10-3 tg САВ 2c d 1 


What degree of agreement is there between the two judges. 
(Punjab Uni. B.Com. 1977, Sept. 1976) 
Ans. 9=—0.454 


Toe LT 
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5. Twelve entries in Painting competition were ranked by two judgesa¢ 


shown below : 
s Entry Я A B CoD Е ТЕ СОН ОБК ЖУО, 
Judgey : 5 2 3 4 1 6 8 7 10 9 12 п 


Judge II : 4 5 2 1 6 T1 30:29. 119612 3 8 
" Find the coefficient of rank correlation. 
E [Punjab Uni. B.A. (Econ. Hons.), 1977}, 
Ans. р=0:46 


6. Ten competitors in a beauty contest are ranked by three judges in 
the following order : 


Ist Judge 1 5 4 8 9 6 107. T 008 50 
Р 2ndJudge 4 8 7 6 5 9 10 3 2 1 
3rd Judge 6 7 8 1 5 10 9 2.3 4 


Use the rank correlation coefficient to discuss which pair of judges has 
the nearest approach to beauty. 


[Guru Nanak Dev Uni. B.Com., 1978 ; Shivaji Uni. B.Com., April 1982] 
Ans. 015—0'5515, P13=0°0545, Фаз =0"7333 


The pair of 2nd and 3rd judges has the nearest approach to common 
tastes in beauty. 


7. Ten competitors in a beauty contest are ranked by three judges as 


follows ; 
Competitors 
Judges : 1 2 3 c RCRUM FAIRE! CUM У) 10 
ASG 5 3 ЛО АО 8 1 
BONES 8 4 7:0. 2:291 35:61. 7) 3 
Соу 9 8 Жы 35 10 175: 37. 6 


Discuss which pair of judges has the nearest approach to common tastes 
of beauty. 


Ans. pap —pae—0 704, — py, —0*3 
Pair of judges 4 and B or 4 and C have the nearest approach to com- 
mon tastes in beauty. 
Ж 8. Calculate Spearman’s coefficient of rank correlation for the follow- 
ing data : 
x 5 22 148 251 83 4 25 92 70 164 


»: 84 385 20 10 292 152 86 120 301 144 
(Bombay Uni. B.Com., May 1982) 


Ans. р= —0:7212. 


MT 
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Э. The following are the marks obtained by-a group of students in two 
Papers. Calculate the rank coefficient of correlation, 


Economics: 78 36 98 25 75 82 92 62 65 39 
Statistics : 84 51 91 69 68 62 86 58 35 49 
(Guru Nanak Dev Uni. B.Com. П, Sept. 1983) 
Ans. р=0:6121 


10. Calculate Spearman's coefficient of rank correlation for the follow- 


ing data of scores in psychological tests (x) and arithmetical ability (y) of 10 
children. 


Child A B c D E F G H толу 


x: 105 104 102 101 100 99 98 96 93 92 
»: 101 103 100 98 95 96 104 92 97 9 
(Bombay Uni. B.Com., April 1983) 
Ans. 9=0°6 
11. Find the coefficient of rank correlation for the following 
data : 
x 4 38 23 330 4 6 6 55 14 6 47 
y H 8 8 19 1 10 0 15 4 1 14 
(Himachal Pradesh Uni, B.Com., April 1981) 
Ans. p=0°7333 
12. Find the coefficient of rank correlation between the marks obtained 
in Mathematics (x) and those in Statistics (у) by 10 students of certain class out 
of a total of 50 marks in each subject. 
Student No. 1 2 3 4 5 6 7 8 9 10 
*? 12 18 32 18 25 24 25 40 38 22 
y 16 15 28 16 24 22 28 36 34 19 
[Himachal Pradesh Uni. М.А, (Econ.), July 1984] 


Ans. p=095 

13. From the following data, calculate the coefficient of rank correlation 
between x and y. 

"М5 32 35 49 60 43 37 43 49 10 20 

y 1x 40 30 70 20 30 50 72 60 45 25 


[Poona Uni. В. Сот, Oct. 1980] 
Ans, 9 —0*0758 


14. Value of the Spearman's rank correlation ` coefficient for a certain 
pair of number of observations was found to be 2/3. The sum of squares of the 
differences between corresponding ranks was 55. Find the number of pairs. 

(Bombay Uni. B.Com., May 1980) 

Ans. n=10 


15. Coefficient of correlation between debenture prices and share prices 
is found to be 0:143. If the sum of the Squares of differences in ranks is given 
to be 48, find the value of n. 


Ans. п=7 
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16. The coefficient of rank correlation of the marks obtained by 10 
students in biology and chemistry was found to be 0:8. It was later discovered 
that the difference in ranks in the two subjects obtained by one of the students 
was wrongly taken as 7 instead of 9. Find the correct coefficient of rank 
correlation. 

[Delhi Uni. B.A. (Econ. Hons. I), 1984 ; Bombay Uni. B.Com., 1976) 


Ans. Correct Value of p=0'6061 


17. Mention the correct answer. 
The ranks according to two attributes in a sample are given 


"below :— 
Ri 1 2 3 4 5 
Rs 5 4 3 2 1 
The rank correlation between them is :— 
0, +1, —1, none of these. 
[C.A. (Intermediate) N.S. Nov., 1987] 
Ans. р=—1 


8.8. Method of Concurrent Deviations. This is very 
casual method of determining the correlation between two series 
when we are not very serious about its precision. This is based on 


the signs of the deviations (i.e., direction of the change) of the 
values of the variable from: its preceding value and does not take 
into account the exact magnitude of the values of the. variables. 
Thus we put a plus (+) sign, minus (—) sign or equality (=) sign 
for the deviation if the value of the variable is greater than, less 
than or equal to the preceding value respectively. The deviations in 
the values of two variables are said to be concurrent if they have 
the same sign, i.e., either both deviations are positive or both are 
negative or both are equal. The formula used for computing corre- 
lation coefficient r by this method is given by 


= ( 2e ) (8-17) 


where c is the number of pairs of concurrent deviations and n is 
the number of pairs of deviations. In the formula (8-17) plus/minus 
sign to be taken inside and outside the square root is of funda- 
mental importance. Since —1<r<1, the quantity inside the square 


Toot, viz., +( A ) 


ginary which is not possible. 


must be positive, otherwise r will be ima- 


Thus if (2c—7) is positive, we take positive sign in and out- 
side the square rootin (8.17) and if(2c—nm) is negative, we take 
negative sign in and outside the square root in (8.17). 


Remarks 1. It should be clearly noted that here п is not the 
number of pairs of observations but it is the number of pairs 
of deviations and as such it is one less than the number of pairs 
of observations. 
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2. rcomputed by formula (8-17) is also known as coeffi- 
cient of concurrent deviations. 


3. Coefficient of concurrent deviations is primarily based on 
the following principle : 


“If the short time fluctuations of the time series are positively 
correlated or in other words, if their deviations are concurrent, their 
curves would move in the same direction and would indicate positive 
correlation between them." 


Thus > computed from (8.17) ordinarily indicates the relation- 
ship between short time fluctuations only. 


Example 8.21. Calculate the coefficient of concurrent devia- 
tions from the data given below : 


Year : 1973 1974 1975 1976 1977 1978 1979 1980 1981 


- Supply : 160 164 172 182 166 170 178 192 186 
Price : 292 280 260 234 266 254 230 190 200 


Solution. 


CALCULATION OF COEFFICIENT OF CONCURRENT 
DEVIATION 


Sign of devi- 

ation from pre- 

ceding value 
(х) 


Sign of devia-| Product 


+++1+++ 


КЕТА КҮ del 


SVR Т 


Неге we have 
n=number of pairs of deviations=8 


c=0, since there is no pair of deviati 


£ . EAS 
Since no product deviations xy is positive, т ишы, 


iau. 
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Coefficient of concurrent deviations is given by 


cw (E) f ет 


Since 2c—n— —8, i.e. (negative) we take negative sign inside 
and outside the square root to get, 


pua RU 


Hence there is perfect negative correlation between the 
supply and the price. 


Example 8.22. Calculate coefficient of correlation by the 
Concurrent Deviation Method. 


Supply : 112 125 126 118 118 121 125 125 131 135 


Price: 106 102 102 104 98 96 97 97 95 90 
(Maharishi Dayanand Uni. (Rohtak) B.Com., Sept. 1980] 
Solution. 
CALCULATIONS FOR COEFFICIENT OF CONCURRENT 
DEVIATIONS 
Supply Sign of deviation Price Sign of deviation Concurrent 
from preceding from preceding ^ deviations 
value value 
[52] [62] 
MEME Ic ere c NOE Th E EE E 
112 106 
125 + 102 - 
126 + 102 = 
118 — 104 £ 
118 = 98 - 
121 + 96 E 
125 + 97 + +(C) 
125 = 97 = =(C) 
131 + 95 2 
135 + 90 - 


We have : n=No. of pairs of deviations=10—1=9 
c=No. of concurrent deviations 
=No. of deviations having like signs—2 
Coefficient of correlation by the method of concurrent devi 
ations is given by : ] 
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=i V 


+(—0:5556) 


Since 2c—n=—5 (negative), we take negative sign inside 
and outside the square root. 


П сва —(—0.5556) ——4/0.5556 = —0.7 


Hence there isa fairly good degree of negative correlation 
between supply and price. 


EXERCISE 8.5 


1. (а) Explain the method of concurrent deviations for computing the cor- 
relation between two variable series, 


(b) Give the points of strength and weakness of finding out the rela- 
tionship between two variables by the method of concurrent deviations. 


(Punjab Uni. B.Com., 1978) 


2. Obtain the coefficient of correlation between price of rice and rainfall 
from the data given below by means of concurrent deviations. 


Price of rice in Annual rainfall Price of rice in — Anuual 
Year Кз. per quintal in centimetres Year Ёз. per quintal rainfall 
in centime- 
tres 
a c аа MEN 
1959 175 315 1965 196 353 
1960 160 340 1966 190 333 
1961 158 350 1967 191 390 
1962 200 350 1968 195 340 
1963 198 330 1969 196 380 
1964 195 335 1970 204 340 


жесе — I RP OE 


Ans. r— —0:3015 


3. Calculate the coefficient of correlation by the method of concurrent 
deviations from the following data : 


Year 1961 1962 1963 1964 1965 1966 1967 1968 1969 
Supply 80 82 86 91 83 85 89 96 93 
Price 146 140 130 117 133 127 115 95 100 
[Punjab Uni. B.Com., 1972} 

Ans, r-—1. 


: , Compute the coefficient of correlation of the following table (by the 


4 
method of concurrent. deviation. relating to the marks obtained by students in 
History and Geography E % I 3 d 


Student SFE 22 33. 44 5 66 77 88 9 10 11 12 
Marks in history : 65 .40 35 75 63 80 35 20 85 65 55 33 
Marks in geography :30 55 68 28 76 25 80 85 20 35 45 65 


Ans, те=—1. 
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5. Calculate correlation coefficient by concurrent deviations method : 


X: 150 135 90 140 100 


ys 60 50 100 80 90 
(Osmania Uni. B.Com. Ш, 1983) 


Ans. r=—0°7071 
6. Calculate coefficient of concurrent deviations from the following 


data : 
x4 00 120 135 135 115 110 110 
у: 50 40 60 60 80 55 65 
(Himachal Pradesh Uni. B.Com. April, 1982) 
Ans. r=0 
E 7. Calculate the coefficient of concurrent deviations from the following 
ata : 


No. of pairs of observations=96 


No. of pairs of concurrent deviations=36. 
(Osmania Uni. B.Com. III, April 1984) 


Ans. rz —0:492 


* 8:0. Coefficient of Determination. Coefficient of correlation 
between two variable series is a measure of linear relationship bet- . 
weenthem and indicates the amount of variation of one variable 
which is associated with or is accounted for by another variable A 
more useful and readily comprehensible measure for this purpose 1s 
the coefficient of determination which gives the percentage variation 
in the dependent variable that is accounted for by the independent 
variable. In other words, the coefficient of determination gives the 
ratio of the explained variance to the total variance. The coefficient 
of determination is given by the square of the correlation coefficient, 
ie. г?. Thus, 


Coefficient of determination 


Explained Variance 
cap TTE SA = ta (8i 
> Total Variance (8:18) 


The coefficient of determination is a much useful and better 
measure for interpreting the value of r- According to Tuttle : 


“The coefficient of correlation has been grossly overrated and is 
used entirely too much. Its square, the coefficient of determination is 
a much more useful measure of the linear covariation of two variables. 
The reader should develop the habit of squaring every correlation co- 
efficient he finds cited or stated before coming to any conclusion about 
the extent of the linear relationship between the two correlated 


variables.” 


For example if the value of т=0.8, we cannot conclude that 
80% of the variation in the relative series (dependent variable) is . 


416 Business Statistics 


due to the variation in the subject series (independent variable), But 
the coefficient of determination in this case is 7*—0 64 which implies 


that only 6495 of the variation in the relative series has been ex- 
plained by the subject series and the remaining 36% of the variation 
is due to other factors. 


By the same argument while comparing two correlation coeffi- 
cients, one of which is 0.4 and the other is 0.8 it is misleading to 
conclude that the correlation in the second case is twice as high as 
correlation in the first case. The coefficient of determination clearly 
explains this viewpoint, since in the case r=0.4, the coefficient of 
determination is 0.16 and in the case r—0.8, the coefficient of 
determination is 0-64, from which we conclude that correlation in 
the second case is four times as high as correlation in the first case. 


Remarks 1. The above discussion implies that : 


“The closeness of the relationship between two variables as deter- 
mined by correlation coefficientr is not proportional.” 


2. The following table gives the values of the coefficient of 
determination (r?) for different values of r. 


r rz r г? 

01 0:01 0:6 0:36 

02 0:04 0-7 0-49 

03 0:09 0:8 0:64 

04 0:16 0:9 0 81 

0:5 0:25 10 100. 
Клен ы sena] A Na eris КЫЫМ Б Y EE A 


It may be seen from the above table that as the value of r 


decreases, r? decreases very rapidly except in two particular cases 
r—0 and r=1 when we get r?—r. 


3. Coefficient of determination is always non-negative and 
as such it does not tellus about the direction of the relationship 
(whether it is positive or negative) between the two series. 


4. Coefficient of Non-Determination. The ratio of the un- 
explained variation to the total variation is called the coefficient of 


non-determination. It is usually denoted by К? and is given by the 
formula : 


к Un-explained Variance 
Total Variance 
pe Explained Variance 
Total Variance 
=1—;3 ...(8.19) 
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. 5. Coefficient of Alienation. The coefficient of alienation is 
given by the square root of the coefficient of non-determination, 
ie., by K as given below : 


к=+ү r7 .. (8:20) 
EXERCISE 8.6 


. X. What is the coefficient of determination ? How is it useful in inter- 
preting the value of an observed correlation coefficient г ? Explain with the 


help of an example. 


2. Explain the terms : 
(i) Coefficient of non-determination, 
(ii) Coefficient of alienation, 
and give their physical interpretation. 
3. A correlation coefficient of 0'5 does not mean that 50% of the data 
are explained. Comment. [Delhi Uni. B.A. Econ. (Hons.), 1970) 
Ans. Statement is true. Only 25% of the variation is explained. 


4. The coefficient of correlation between consumption expenditure (c) 
and disposable income (у) in a study was found to be +0°8. What percentage 
of variation in с are explained by variation in у. 

[Delhi Uni. B.A. Econ. (Hons.), 1973] 


Ans. 64% of the variation in c is explained by variation in y. 

5. А correlation between two variables has a value r=0°6 and a corre- 
Jation between other two variables із 0:3. Does it mean that the first correlation 
is twice as strong as the second ? 

Ans. No 

6. Comment on the following : 

“The closeness of the relationship between two variables as determined 
by r, the correlation coefficient between them, is proportional.” . 

Ans. Statement is wrong. 

7. Do you agree with the statement : *«r—0*8 implies that 80% of the 
data are explained.” 

Ans. No. Only 64% of the data are explained. 


9 


Linear Regression Analysis 


9.1. Introduction. The literal or dictionary meaning of the 
word ‘Regression’ is ‘stepping back or returning to the average value.’ 
The term was first used by British biometrician Sir Francis Galton 
in the later part of the 19th century in connection with some 
studies he made on estimating the extent to which the stature of 
the sons of tall parents reverts or regresses back to the mean stature 
of the population. He studied the relationship between the heights 
of about one thousand fathers and sons and published the results 
in a paper ‘Regression towards Mediocrity in Hereditary Stature’. 
The interesting features of his study were : 


(i) The tall fathers have tall sons and short fathers have short 
sons. 


(ii) The average height of the sons of group of tall fathers is 
less than that of the fathers and the average height of the sons of 
a group of short fathers is more than that of the fathers. 


In other words Galton's studies revealed that the- offsprings 
of abnormally tall or short parents tend to revert or step back to 
the average height of the population, a phenomenon which Galton 
described as Regression to Mediocrity. 


He concluded that if the average height of a certain group 
of fathers is ‘a’ cms above (below) the general average height then 
average height of their sons will be (aXr) cms. above (below) the 
general average height where r is the correlation coefficients between 
the heights of the given group of fathers and their sons. In this 
case correlation is positive and since | 7 | <1 we have axrsa. 
This supports the result in (ii) above. 


But today the word regression as used in Statistics has a much 
wider perspective without any. reference to biometry. Regression 
analysis, in the general sense, means the estimation or prediction of 
the unknown value of one variable fromthe known value of the 
other variable. It is one of the very important statistical tools which 
is extensively used in almost all sciences—natural, social and 
physical. It is specially used in business and economics to study the 


Linear Regression Analysis 47. 


relationship between two or more variables that are related casually 
and for estimation of demand and supply curves, cost functions, 
production and consumption functions, etc. 


Prediction or estimation is one of the major problems in 
almost all spheres of human activity. The estimation or prediction 
of future production, consumption, prices, investments, sales, 
profits, income, etc., are of paramount importance to а business- 
man or economist. Population estimates and population projec- 
tions are indispensable for efficient planning of an economy. The 
pharmaceutical concerns are interested in studying or estimating 
the effect of new drugs on patients. Regression analysis is one of 
the very scientific techniques for making such predictions. In the 
words of M.M. Blair '*Regression analysis is a mathematical measure 
of the average relationship between two or more variables in terms of 
the original units of the data". 


We come across a number of inter-related events in our day- 
to-day life. For instance the yield of a crop depends on the rainfall, 
the cost or price of a product depends on the production and adver- 
tising expenditure, the demand for a particular product depends on 
its price, expenditure of a person depends on his income and so on. 
The regression analysis confined to the study of only two variables 
at a time is termed as simple regression. But quite often the values 
of a particular phenomenon may be affected by multiplicity of 
factors. The regression analysis for studying more than two varia- 
bles at a time is known as multiple regression. However, in this 
chapter we shall confine ourselves to simple regression only. 


In regression analysis there are two types of variables. The 
variable whose value is influenced or is to be predicted is called 
dependent variable and the variable which influences the values or 
is used for prediction, is called independent variable. In regression 
analysis independent varibale is also known as regresser or predictor 
or explanator while the dependent variable is also known as regressed 
or explained variable. 


9.2. Linear and Non-Linear Regression. If the given 
bivariate data are plotted on a graph, the points so obtained on the 
scatter diagram will moré or less concentrate round a curve, called 
the ‘curve of regression. Often such a curve is not distinct and is 
quite confusing and sometimes complicated too. The mathematical 
equation of the regression curve, usually called the regression 
equation, enables us to study the average change in the value of the 
dependent variable for any given value of the independent variable. 


If the regression curve is a straight line, we say that there is 
linear regression between the variables under study. The equation of 
such a curve is the equation of a straight line, i.e., a first degree 
equation in the variables x and y. In case of linear regression the 
values of the dependent variable increase by a constant absolute 
amount for a unit change in the value of the independent variable. 
However, if the curve of regression isnot a straight line, the re- 
gression is termed as curved or non-linear regression. The regression 
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equation will be a functional relation between x and y involving 
terms in x and y of degree higher than one, i.e., involving terms of 
the type х?, y?, xy, etc. However, in this chapter we shall confine 
our discussion to linear regression between two variables only. 

9.3. Lines of Regression. Line of regression is the line 
which gives the best estimate of one variable for any given value 
of the other variable. Incase of two variables x and y, we shall 
have two lines of regression; one of y on z andthe other of x 
on y. 

: Definition. Line of regression of y on x is the line which gives 
the best estimate for the value of y for any specified value of x. 

Similarly, line of regression of x on y is the line which gives the 
best estimate for the value of « for any specified value of y. 

The term best fit is interpreted in accordance with the 
Principle of Least Squares which consists in minimising the sum of the 
squares of the residuals or the errors of estimates, i.e., the deviations 
between the given observed values of the variable and their correspond- 
ing estimated values as given by the line of best fit. We may mini- 
mise the sum of the squares of the errors parallel to y-axis or 
parallel to z-axis, the former [i.e. minimising the sum of squares of 
errors parallel to y-axis], gives the equation of the line of regression 
of y on x and the latter, viz., minimising the sum of squares of the 
errors parallel to x-axis gives the equation of the line of regression 
of x опу. 

We shall explain below the technique of deriving the equation 
of the line of regression of y on x. 

9.3.1. Derivation of Line of Regression of y on x. Let (ху, уу), 
(Xa, Ya), «++» (Xn, y»), be n pairs of observations on the two variables 
x and y under study. Let 

y=a+bx 
be the line of regression (best fit) of y on z. 


Y 


Hi (X; a ebxi) 


Fig. 7.1. 


ПИЧ 


-= = 


Linear Regression Analysis 481 


For any given point P.(x:, yı) in the scatter diagram, the error 
of estimate or residual as given by the line of best fit (9.1) is P«H«. 
Now, the x-coordinate of Н; is same as that of Pr, viz , 2; and since 
Н; (xt) lies on the line (9.1), the y-coordinate of Hi, i.e., Hy M is 
given by (a+bxi). Hence the error of estimate for P, is given by 

Р.Н; = РМ НМ 

=yi—(a+bxi) 

This is the error (parallel to the y-axis) for the Аһ point. We 
will have such errors for all the points on scatter diagram. For the 
points which lie above the line, the error would be positive and for 
the points which lie below the line, the error would be negative. 


According to the principle of least squares, we have to deter- 
mine the constants a and b in (9.1) such that the sum of the squares 
of the errors of estimates is minimum. In other words, we have to 


minimise 
n n 
E- > рнг= > i-a- b (9.3) 


i-1 i=l 
subject to variations in a and b. 

We may also write E as : 

E=X(y—ys)*=X(y—a—bz)*, «+ (93a) 
where ys is the estimated value of у as given by (9.1) for given 
value of x and summation (X) is taken over the n pairs of observa- 
tions, 

Using the principle of maxima and minima in differential 
calculus, E will have an extremum (maximum or minimum) for 
variations in a and b if its partial derivatives w.rt. а and b 
vanish separately. Hence from (9.3a) we get 


дЕ _ дЕ (9.4 
Ja 0 and ЭБ 0 (9.4) 
> Eyenad-bEx (9.5) 
and Exy-aZx-4bEx* . (8.6) 


These equations are known as the normal equations for esti- 
mating a and b. The quantities 2х, Zx*, Ey, Exy can be obtained 
from the given set of n points (ху, уп), (s, Ya), (Xm, yn) and we 
can solve the equations (9.5) and (9:6) simultaneously for a and b, 


(Zx3)(Zy) — (Ex(Exy) у 

> an EIE ..(9.7) 
nXxy—(Zz)yEy) 3 

and b= -nia (9.8) 


— 
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Substituting these values of a and b from (9.7) and (9.8) in 
(9.1), we get the required equation of the line of regression of y, 
on z. 

The equation of the line of regression of y on z can be obtain- 
ed in a much more systematic and simplified form in terms of x, y, 
сг, oy and r—ray as explained below. 

Dividing both sides of (9.5) by n, the total number of pairs, 
we get 

1 1 
— Ху=а+Ь. —Zx 
n n 
= ў=а+Ь% ...(9.9) 

This implies that line of best fit, i.e., regression of y on x passes 
through the point (Y, ӯ). Or in other words, the point (X, y) lies on 
the line of regression of y on x. 


We get 
a Cov (x, y) 
EON UE -. (9.10) 


[^ 


Hence, the required equation of the line of regression of y on | 
x becomes : | 
] 


y-y=b(x—2) (9.11) 
ог y-y= 6G». .(x—x) 09.12) 
= y-y-- 72 (x8) 9.13) 


Remarks 1. From (9:4) we have : 
Z(y-a—bz)-0 
=> z(y—y)-0 ...(9.14) 


where ye is the estimated value of у for a given value of х аз given 
by the line of regression of у on z (9.1). 


© a The line of regression of y on x passes through the point 

9.3.2. Line of Regression of x on y. The line of regression 
of x on y is the line which gives the best estimate of x for any given 
value of y. It is also obtained by the principle of least squares 
on minimising the sum of squares of the errors parallel to the 
x-axis (See Figure 9.2 below). By starting with the equation of the 


form : 
x=A+By, ...(9.15) 
and minimising the sum of the squares of errors of estimates of x, 
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i.e., deviations between the given values of x and their estimates 
given by line of regression of x on y, viz., (9.15), ie., minimising 
E-—X(x—A— Ву), (9.16) 
we shall get the normal equations for estimating 4 and B as : 
Zx—nA-- BXy ] 


Sey 43у ВЗ -« (9.17) 


Fig. 9:2 
ing (9.17) simultaneously for 4 and B, we shall get : 
Solving (9.17) v (29) 2) (yx) 918) 
(0 nXyt—(XyY 
LLuExy- Quy) ...(9.19) 
nay?—(Xy)? 
Substituting these values of A and В in (9°15) we shall get the 
required equation of line of regression of x on y. 
Remark. The values of A and В obtained in (9.18) and 


(9.19) are same as in equations (9.7) and (9.8) with x changed to y 
and y to x. 


and B 


Proceeding exactly as in the case of line of regression of y on 
x, we shall get from (9-17) the following results : 


G) х=4+Ву (9.20) 


This implies that the line of regression of x on y passes through the 
point (x, y). 
E = Cov (x,y) re 
(ii) B= oy a ass (9:91) 


(iii) The equation of the line of regression of æ on y is 
x—x-—B(y—y) »..(9.22) 
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= х= OUR G-y) ...(9.23) 

> a—x— ——(y-y) 20924) 
Oy 


x Remarks 1. The regression equation (9.13) implies that the 

- line of regression of y on x passes through the point (x, y). Simi- 

larly (9.24) implies that the line of regression of x on y also passes 

through the point (x, y). Hence both the lines of regression pass 

through the point (x, y). In other words, the mean values (x, y) 

s be obtained as the point of intersection. of the two regression 
ines. 


2. Why two lines of regression ? There are always two lines 
of regression, one of y on x and the other of x on y. The line of 
regression of y on x (9.12) or (9.13) is used to estimate or predict 
the value of y for any given value of x, i.e., when y isa dependent 
variable and x is an independent variable. The estimate so obtained 
willbe best in the sense that it will have the minimum possible 
erroras defined by the principle of leastsquares. We can also 
obtain an estimate of z for any given value of y by using equation 
(9.13) but the estimate so obtained will not be best since (9.13) is 
obtained on minimising the sum of the squares of errors of estima- 
tes in y and not in x. Hence to estimate or predict x for any given 
value of y, we use the regression equation of x on y (9.24) which is 
derived on minimising the sum of the squares of errors of estimates 
in x. Here x is a dependent variable and y is an independent vari- 
able. The two regression equations are not reversible or inter- 
changeable because of the simple reason that the basis and assump- 
tions for deriving these equations are quite different. The regression 
equation of y on х is obtained on minimising the sum of the square 
of the errors parallel to the y-axis while the regression equation of 
x on y is obtained on minimising the sum of squares of the errors 
parallel ta the x-axis. 


Jn a particular case of perfect correlatio iti 
2 in г J п, positive or negative 
ùe., r=1, the equation of line of regression of y on x becomes : 


y-y= t+ (2—9) 
ox 


T Pi deg) —@) 
bone the equation of the line of regression of x on y 
х—х=+ у 0r» 


which is same as (*). 


Hence in case of perfect correlation (r—-E1) both the lines 
‚ of regression coincide. Therefore, in general we always have two 
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lines of regression except in the particular case of perfect correla- 
tion when both the lines coincide and we get only one line. 


9.3.2. Angle Between the Regression Lines. 
If 0 is the angle between the two lines of regression then 


дие. у exa г2—1 
> 0—tan E =) ...(9.25у 
In particular, if r—-E1 then 


0=(ап71 (0) > — 6—O0orm, 


Le. the two lines are either coincident (0—0) or they are parallel 
(6=n), Butsince both the line of regression intersect at the point 
(X, Y), they cannot be parallel. Hence in case of perfect correlation, 
positive or negative, the two lines of regression coincide. 

If r—0, then from (9.25), 

8—tan (oo )—m/2 

i.e., if the variables are uncorrelated, the two lines of regres- 

Sion become perpendicular to each other. 


Remark. We have seen above that if r—0 (variables uncorrelated), 
the two lines of regression are perpendicular to each other and if 


TWO LINES COINCIDE TWO LINES COINCIDE TWO LINES 


(r=—1) (r=1) PERPENDICULAR 
(r=0) 
| | | 
o x 0 x о x 
TWO LINES APART (LOW TWO LINES CLOSER (HIGH 


DEGREE OF CORRELATION) DEGREE OF CORRELATION) 


bc l^ 
о X 0 X 


Fig. 9.3 
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r=+1, 68-0, i.e., the two lines coincide. This leads us to the con- 
clusion that for higher degree of correlation between the variables, 
the angle between the lines is smaller, i.e., the two lines of regres- 
Sion are nearer to each other. On the other hand, the angle between 
the lines increases, i.e., the lines of regression move apart as the 
value of correlation coefficient decreases. In other words, if the lines 
of regression make a larger angle, they indicate a poor degree of 
correlation between the variables and ultimately for 0—/2, i.e., the 
lines becoming perpendicular if no correlation exists between the 
variables. Thus by plotting the lines of regression on a graph 
paper, we can have an approximate idea about the degree of cor- 
relation between the two variables under study. Some illustrations 
are given in Fig. 9.3. 


9.4. Coefficients of Regression. Let us consider the line of 
regression of y on x, viz., 
y=a+ bx 


The coefficient ‘b’ which is the slope of the line of regression 
of y on x is called the coefficient of regression of y on x. It repre- 
sents the increment in the value of the dependent variable y for a 
unit change in the value of the independent variable z. In other 
words, it represents the rate of change of y w.r.t. x. For notational 
convenience, the slope b, i.e. coefficient of regression of y on x is 
written as bys. Ў 


Similarly in the regression equation of x on у, viz., 
z—A-- By, 
the coefficient B represents the change in the value of dependent 
variable x for a unit change in the value of independent variable y 
and is called the coefficient of regression of x on y. For convenience, 
it is written as bry. 


From (9.10), the coefficient of regression of y on x is given by 


_ Cov (х, y) ro. 


because Cov(x, y)=roxoy. 


. Similarly from (9.21), the coefficient of regression of x on y is 
given by: 


bay = COX 05 y) rez 


S aem «-+(9.27) 
Accordingly the equation of the line of regression of y on x becomes 
y—y-by(z— X), ... (9.28) 


and the equation of the line of regression of x on y becomes : 


or eer +. 
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x—E-—bsy(y — y) ...(9.29) 


Remarks 1. For numerical computations of the equations of 
line of regression of y on and x on y, the following formulae for 
the regression coefficients Буг and bay are very convenient to use. 


Соу (х,у) (х— )(у—7ӱ) 
ЕЕЕ 109.30) 
n&ay—(Zx)(Zy) 
vate 209.31) 


_ Cov (x,y)  £(&—XXy—-Y) 

bo A e SO YR ++(9.32) 
®ху—(®х)(®у) 

> baa (9.33 ) 


Formulae (9.30) and (9.32) are very useful for computing the 
values of regression coefficients from given set of n points (21, y» 
(хэ, Уз), (n, Yn). 

Other convenient formulae to be used for finding the regression 
coefficients for numerical problems are : 


E bus 


LO. Hune .. (9.34 
by and bav 5 (9, 34) 


2. Correlation coefficient between two variables x and y is a 
symmetrical function between x and y, i.e., T«v—rvs. However, the 
regression coefficients are not symmetric functions of xand у, ie. 


bysz-brvy. 
Cov (х, у) MO 


boy= cor (n » X ert) 
oy 
_ Соу (х, у) ...(***) 
апа ә о есуп 


From (*) and (**) we observe that the sign of each regression 


coefficient Бу» and bay depends on the convariance term since ox> 


and су>0. If Cov (x, y) is positive, both the regression coefficients 
are positive and if Cov (x, y) is negative, both the regression coeffi- 
cients are negative. 


4. Further. since ox>0 and ay 0, the sign of each ofr, bya 
and bay depends on the covariance term, If Cov (x, y) is positive, 
all the three are positive and if Cov(z, y) is negative, all the 
үе are negative. This result can be stated slightly differently as 

ollows : 
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The sign of correlation coefficient is same as that of the regres- 
sion coefficients. If regression coefficients are positive, r is positive 
and if regression coefficients are negative, r is negative. 


9.4.1. Theorems on Regression Coefficients 


Theorem 9'1. The correlation coefficient is the geometric 
mean between the regression coefficients i.e., 


r?=byz . bzy ...(9.35) 
Proof. We have, 


bus. OY x, yo 90) (9.36) 
сх on 
—Cov(x,y) _ ox 

Бау ae qe J «(9-39 


Multiplying (9-36) and (9.37) we get 
r?=bys D bey 
€ r2 dA bua . bay 9.38) 
which establishes the result. 
Remark. The sign to be taken before the square root is 
same as that of regression coefficients. Jf the regression coefficients 


are positive, we take positive sign in (9.38) and if regression coefficients 
are negative we take negative sign in (9.38). 


Theorem 9.2. If one of the regression coefficients is greater 
than unity (one), the other must be less than unity. 


, Theorem 9.3. The arithmetic mean of the regression co- 
efficients is greater than the correlation coefficient. 


> $(byrtbev)> 
Theorem 9.4. Regression coefficients are independent of 
change of origin but not of scale. 


Let us transform from x and y to new variables u and 
v by change of origin and scale, viz., 


х=й уБ 9. 
аав Vio rue ...(9.39) 
where а, b, h>O and k>0 are constants. 
We have : 


ГЕ +, ОЕ (9.40) 
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Also 
Ба 1. hs .+-(9.40a) 


In particular if we take h=k=1, i.e., we transform the vari- 
ables x and y to u and y by the relation : 


u=x—a-and v—y— b, 
ie. by change of origin only then from (9 40) and (9.402) we get 


bs У 
Бау = Биь= "ос. (9.41а) 
апа byo=bou= “сша. (9.4 1b) 


These formulae are very useful for obtaining the equations of 
the lines of regression ifthe mean values ¥ and/or y come out to 
be in fractions or if the values of x and у are large. 


Example 9.1. From the following data, obtain the two regres- 
sion equations : 
Sales EON 97 «108. 120 672 12451 737 1125.37 
Purchase: 71 75 69 97 70 91 39 61 80 47 
[С.А. (Intermediate) May 1980, May, 1977] 


Solution. Let us denote the sales by the variable X and the 
purchases by the variable Y. 


CALCULATIONS FOR REGRESSION EQUATIONS 
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bre 200—9) 0—5) _ Zddy _ 3900 132 


X(r—x) Zd; 6360 
Xie X(x—x)(y—y) _ Ed.d, 3900 —1361 
a X(yp-y?  — Ed 2868 
Equation of line of regression of ^ Equation of line of regression of 
y on x is ` z on y is 
y—y-bys (z—X) X—€—bzy (y —y) 
= y—70—0'6132 (x—90) = x—90— 1:361 (y— 70) 
—0'6132x— 55:188 =1°361y—95°27 
> y=0°6132x—55°188+70°000 |= x-1:361y—95'274-90:00 
- y=0°6132x-+ 14:812 - х=1'361у—5'27 


Example 9.2. From the data given below find : 
(a) The two regression equations. 


(b) The coefficient of correlation between the marks in Econo- 
mics and Statistics. 


(c) The most likely marks in Statistics when marks in Econo- 
mics are 30. 


Marks in Economics: 25 28 35 32 31 36 29 38 34 32 
Marks in Statistics : 43 46 49 41 36 32 31 30 33 39 
[Delhi Uni. B. Com. (Hons.), 1982] 


, Solution. Let us denote the marks in Economics by the 
variable X and the marks in statistics by the variable Y. 


CALCULATIONS FOR REGRESSION EQUATIONS 


x y  dx—-x—& dy=y—7 ах dy? dxdy 
—x—32 =у—38 


25 43 27 5 49 25 —35 
28 46 -4 8 16 64 -32 
35 49 3 11 9 121 33 
32 41 0 3 0 9 0 
31 36 -1 =2 1 4 2 
36 32 4 —6 16 36 —24 
29 31 =з 27 9 49 21 
38 30 6 =й 36 64 —48 
34 33 2 - 4 25 —10 
32 39 0 1 0 1 0 


®х=320 Zy—380 — Zdx-0 йу=0 Zdx*=140 >4у+=398 Edxdy=—93 


Дед, 
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Here, 


=———-—=32; апа y =38, 
n 


. X(—xX(y—y) Zdudy _—93 . 
be—— Xx—xX9 ESTO =p 96645 


Similarly, coefficient of regression of x on y is 
5(х—Х)(у—ў) Zdzy _—93 _ 95337 


уув = Xd 7398 
Regression Equations 
; Equation of the line of regres- Equation of the line of re- 
sion of x on y is : gression y on x is : 
x—x=bay (y—Y) y—y —bu(x—X) 
= ¢—32=—0.2337(y—38) = y—38=—0.6643(x—32) 


= —0.2337y+0.2337 x 38|= у= —0.6643x+38-+-0.6643 

— —0.2337y4- 8.8806 x32 
> 2=—0.2337y+32+8 8806 = —0.6643x 4-38 
- x——0.2337y 4- 40.8806 +21-2576 
' > у= —0-6643x 4-59.2576 

569) 
Correlation coefficient. We have 
r?= bys bay — (— 0.6643) x (—0.2337) —0.1 552 


= г=++/0.1552 = +0-394 

Since both the regression coefficients are negative, r must be 
negative. Hence, discarding plus sign, we get 

r=—0.394 

In order to estimate the most likely marks in Statistics (y) 
when marks in Economics (x) are 30, we shall use the line of re- 
gression of y on x viz., the equation (*). Taking x=30 in (*), the 
required estimate is given by 

y=—0 6643 x 30+ 59-2576= — 19.9294-59.2576 
— 39.3286 

Hence the most likely marks in Statistics when marks in 

Economics are 30, are 39-3286=39. 


Example 9.3. A panel of judges A and B graded seven debators 
and independently awarded the following marks : 
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Debator Marks by A Marks by B 
1 40 32 
2 34 39 
3 28 26 
4 30 30 
5 44 38 
6 38 34 
7 31 28 


An eighth debator was awarded 36 marks by Judge A while 
Judge B was not present. 

If judge B were also present, how many marks would you expect 
him to award to the eighth debator assuming that the same degree of 
relationship exists in their judgement ? 

[C.A. (Intermediate), May 1982) 


Solution. Letthe marks awarded by judge '4' be denoted 
by the variable X and the marks awarded by Judge ‘B’ by the vari- 
able Y. 


CALCULATIONS FOR REGRESSION EQUATIONS 


Debator x y u=x—A у=у-В и y» up 
=х—35 =y—30 


1 40 32 5 2 25 4 10 
2 34 39 -1 9 1 81 —9 
3 28 26 <7 —4 49 16 28 
4 30 30 —5 0 25 0 0 
5 44 38 9 8 81 64 72 
6 38 34 3 4 9 16 12 
7 31 28 —4 —2 16 4 8 
———___________ 
` Total Zu=0 Zy—17 212—206 > Buy 
=185 =121 


а —————__————_—_— 


The marks awarded by judge A to the eight debator are given 
to be 36, i.e., we are given x—36. We want to find the marks which 
would have been given to the 8th debator by judge B, if he were 
present. In other words, we want to find у when x=36. То do 
this we need the equation of line of regression of yon æ. Inthe 
usual notations we have : 
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X-—4- 2u pd =35 
n 7 


y=B+ A -4 —304-2.4286—32.4286 


DE nZuy—(Zu) (Ху) 
noU Ии (Хи)? 
—1X121—0x17 121 


=—752060 "206 =0.5874 


The equation of line of regression of y on x is given by 
у—ў=буа(х—Х) 
= y—32.4286=0.5874(x—35) 
=0.5874x—0.5874 x 35 


y=0.5874x—20.5590+32.4286 
y=0.5874x+ 11.8696 


y 


When x=36, 
y=0.5874 x 364-11.8696 
=21.1464-+- 11.8696 —33.016 


Hence, if the judge B were also present, he would have given 
33 marks to the eighth debator. 


Example 9.4 Obtain the equations of the two lines of regres- 


sions for the following data : 


X: 43 44 46 40 44 42 45 42 38 40 42 57 
Yor 293119" IS AD 27.27 29. 41. 30. 26 710 


Hence obtain the value of the correlation coefficient between X 


and Y. 


Solution. Here 2x—523, Dy=306 and п=12. Since X and y 
are not integers but come out to be fractions, the formula (9.30) or 
(9.32) for obtaining the regression coefficients of the two lines of 
regression will be very tedious and time consuming. Hence we shall 
obtain the equations of the lines of regressions by taking the 
deviations of x and y from arbitrary values, say A=44 for x-series 
and B=26 for y-series and then use the formulae (9.41 а) and 


(9.41 b) 
| Let u=x—A=x—44, y—y— B— y—26. 


494 Business Statistics 


CALCULATIONS FOR REGRESSION EQUATIONS 


x y u=x—44 у=у—26 и? w uy 
43 29 —1 3 1 9 —3 
44 31 0 5 0 25 0 
46 19 2 —7 4 49 —14 
40 18 —4 —8 16 64 32 
44 19 0 —7 0 49 0 
42 27 -2 1 4 1 —2 
45 27 1 1 1 1 1 
42 29 —2 3 4 9 —6 
38 41 —6 15 36 225 —90 
40 30 —4 4 16 16 —16 
42 26 —2 0 4 0 0 
57 10 13 —16 169 256 —208 


Ух ху 5и=—5 5у=—6 Zut—255 2y!—704 Suv=—306 
=523 =306 


[en Feet e ш: 
Xu-—-—5, = ü-——-- Ium 0.4175 
M22 E: 
S DA SS ee ETAPA UN 
Y 6, > a 12 0.5 
x=A+0=44—0.4175=43.5825~43.58 
Y=B+v=26—0.5=25.50 
TENE RU nEuy — (Zu) (Sy) 
асан nXyi—(Xuyp 
2 12x (—306) —(—5) x.(—6) 2 =3672—30 
Is 12 255—(—5)? ~ 3060—25 
— 3702 
= 3035 122 
mL. nXuv— (Xu) (Ху) 
xy = Dus пу (уу) — 
_ 12x (—306)— (—5) x(—6) __2—3672—30 
ET 12x 704 —(—6) ~ 8448—36 
dua ecd 


8412 
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Regression Equation of Y on X Regression Equation of X on Y 


y—y-by«(x—X) x—k=bay(y— 
> )—25.50=—1.22(х—43.58) BS PR ei RN 
=—1.222+1-22 x 43.58 =—0.44y+0.44 
=—1.22x+53-17 x 25.50 
> у= —122x4-53.174-25:50) —-—044y4-11.22 
——].22z4-78.67 > x=—0-44y+43-58 
+11-22 
— —044y-- 54.80 


Correlation Coefficient. We have : 
r2—byz.bz, —(— 1.22) x (—0.44)=0.5368 
= г=-Е\/ 0.5368= 0.7326 


Since both the regression coefficients are negative, r must be 
negative. [с. f. Remark to $941]. Hence, discarding plus sign we 
get r= —0 7326. 


Example 9.5. The following data about the sales and advertise- 
ment expenditure of a firm is given below : 


Sales Advertisement 
(in crores of Rs.) expenditure 
(in crores of Rs.) 
Means 40 6 
Standard deviations 10 1.5 


Coefficient of correlation=r=0.9 
(i) Estimate the likely sales for a proposed advertisement 
expenditure of Rs. 10 crores. 


(ii) What should be the advertisement expenditure if the firm 
proposes a sales target of 60 crores of rupees ? 
[Delhi U. B. Com. (Hons.) 11, 1985] 


Solution. Let the variable x denote the sales (in crores of Rs.) 
and the variable y denote the advertisement. expenditure (in crores, 
of Rs.). Then, in usual notations, we are given : | 


#=40, ox=10; y—6, су=1 5, r=rry=0.9. 


(Ò To estimate the likely sales (x) for given advertisement 
expenditure (y), we need the regression equation of x on y which is 


given by: - 


2 rox "TOX F 
тоол a TIS 


R х=? y-6)+.40= 60—6)- 40 E 
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Hence the estimated sales(x) for a proposed advertisement 
expenditure (y) of Rs. 10 crores are obtained on putting y=10 in 
(*) and are given Ьу: 


a=6(10—6)+40=6 x 44-40— 64 crores of Rs. 
(ii) To estimate the advertisement expenditure(y) for pro- 


posed sales (x), we need the equation of line of regression of y on x 
which is given by : 


yoy=h (x3) > yt a)y 
=29%1-5(, -40)4-6—0-135(x—40)-6 (**) 


10 


Hence the likely advertisement expenditure (y) of the firm for 
proposed sales target (x) of 60 crores of Rs. is obtained on taking 
x=60 in (**) and is given by : 

y=0.135(60—40)+6=0.135 x 204+6=2.7+6 
=8.7 crores of Rs. 


Example 9.6. For some bivariate data, the following results 
were obtained : 
Mean yalue of the variable x =53.2 
Mean value of the variable y=27.9 
Regression coefficient of y on x=— 1.5 
Regression coefficient of оп y=— 0.2 
What is the most likely value of y when х=60? What is the 
coefficient of correlation between ж and у? 
(Delhi U. В.А. (Econ. Hons. I) 1985 ; I.C.W.A. (Final) June 1984] 


Solution. We are given : 

#= 53.2, у=27.9, bys— — 1.5, bz,— —0.2 

To obtain the most likely value of y for given x, we have to 
find the equation of the line of regression of y on x which is given 
by: 


y—yzby(x—X) > yoby(x— X)- y 
i.e., y=—1-5(e—53.2)4+-27-9= — 1.5x-1.5 х 53.2+27.9 
> у=-— 1.5а+79.8-Е27.9 
> y=—1.5x+107.7 xn) 


Putting x=60 in (*), the most likely estimate of y is given by 
yz —1:5x 604-107.7—— 90+ 107.7=17.7 
The coefficient of correlation between x and y is given by : 
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Tas? = byz.bev=(— 1:5) x (—0.2)=0.30 
> у= ЕМ 0.30=-Е0.5477 

Since the regression coefficients are negative, rey must also Бе 
negative. Hence we take ray=—0.5477. Р 

Example 9.7. Ву using the following data, find out the two , 
lines of regression and from them compute the Karl Pearson's coeffi- 
cient of correlation. 

ZX—250 ; ZY—300 ; ZXY— 7,900 ; ZX*—6,500 ; 

sY?= 10,000 ; and N—10 

[Delhi U. B.Com. (Hons.) II, 1984] 


Solution. We have : 


ZX 250 а У v 300: = 
Mex go i pr ue 


bss— Coefficient of regression of Y on X 
NzXY-(ZX)(XY) 10x 7900 —250 x 300 

—— NZXi-(EXy = 10x6500=250 
79000—75000 _ 4000 _ 6 
77765000—62500 ^ 2500 ` 

Бу = Coefficient of regression of X on Y 
_ МУХҮ—(ХХ) (ХУ) 10 x 7900 —250 х 300 
—-NEYIC(ZY) 7 1010000 —(300)° 


...79000—75000 | 4000 o, 


; Hence correlation coefficient ray between X and Y is given 
у: 


Pay! = Буг, bey=1-6 X 0.4—0.64 
> у= ++/0-64= +0.8 
Since the regression coefficients are positive, we take r= +0.8. 


Regression Equations 
Regression equation of Ү оп Х: | Regression equation of X on Y 


Y¥—V=byz (X—X) X—%=bey (Y—y) 
= Y—30=1-6 (X—25) > X-—25—04(Y-30) 
> Y=1.6¥—40+30 > =0.4 Y—124-25 
= Ү=16Х—10 > X204 Ү+13 
EXERCISE 9.1 


мт (ду Explain the concept of regression and point out its usefulness in 
dealing with business problems. [Delhi Ит. MBA 1974) 


498 Business Statistics 


id (b) . What is a scatter diagram ? Indicate by means of suitable scatter 
diagrams different types of correlation that may exist between the yariables in 
biyariate data. W. 


iyariat: hat are regression lines ? Write down the main points of 
distinction between correlation analysis and regression analysis. 


[I.C.W.A. (Final), June 1976] 


2. (a) What is regression analysis ? How does it differ from correlation? 
Why there are, in general, two regression equations ? 


[Osmania U. B. Com. (Hons.) April 1983] 


S . (b) What do you mean by regression ? Why are there two regression 
lines in case of a bivariate series ? 


[Punjabi U. M.A. (Econ.) 1982; Himachal Pradesh U. M.A. (Econ.) Feb. 1983] 


3. Write down the equations of the two regression lines, explaining the 
constants used. Why should we use two different regression lines? Can the 
two lines coincide ? If so when ? [Bombay U. B. Com., Nov., 1982] 

4. (a) It is said that regression equations are irreversible meaning 
thereby that you cannot find out the regression equation of x on у from that of 
yon x. Justify the comment with special reference to the principle of least 
Squares. [Delhi U. B.A. Econ. (Hons.) 1980] 

(b) Explain the term ‘Regression’. Why do we take, in general, two 
regression lines ? When are the regression lines (i) perpendicular to each other 
and (ii) coincide ? U.C. W. А. (Final) June 1984) 


5. (a) Define regression coefficients. hat information do they 
supply ? 


t “regression lines"? Explain. Define bey, byo and rey and 
ШОк К ВАРТ [Calicut Uni. M.A. (Econ.) 1975] 


6. Obtain the equations of the two lines of regression for the data given 
below : 


хх 1 2 3 4 5 6 7 8 9 
d 9 8 10 12 11 13 14 16 15 


[I.C.W. A. (Final), Dec. 1978 ; Punjabi Uni. M.A. (Econ.), 1978 ; 
Bangalore Uni. B.Com., April 1981) 
Ans. Y=0.95X+7.25; X=0.95Y+7.25 


7. From the following data of th d age of wife, 
form two regression lines e 5 data of the age af husband and the ag + 


is 16. calculate the husband’s age when the wife’s age 


Husband’s age : 36, 23, 


21,198, DRY 99, МЕҢ: 131,533, 35. 
Wife's age so LS fF 


QU ООУ Sofa Bi 20) 2.27, 29,728. 
[Bangalore Uni., B.Com., April 1978] 
; — Wife'sage:y 


x-0:8y4-10 ; (*), 


Ans. Husband's age : х 
у=0°95х—3-5; 


=16 7228 


8. а = 
calculate ete He following data between the ages of husbands and wives 


E Tegres: ^ » ife 
age is 20 and wife's die ара ЧИ fes EIUS husband's age when wife's 
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Wife'sage(X) |: ^18 20 22 .23. 27 28.30 
Huband'sage(Y) : 23 25 27 30 32 31 35 

[Himachal Pradesh Uni. B.Com., April 1982] 
Ans. 253388—25 ; 25:0189—25 


o: : Я 
two regres „You are given the data relating to purchases and sales. Obtain the 
sales whi sion equations by the method of least squares and estimate the likely 
en the purchases equal 100. 


Purchases 62 72 98 76 81 56 76 92 88 49 
Sales 112 14 131 117 12 96 120 136 97 85 
[C.A. (Intermediate), May 1975] 
Ans. Purchase: x; Sale: y; x—0:6515y 4-0:0775 
ye-0:7825y--56:3125 ; 134-5625 


10. The following table gives the age of cars of a certain make and 
annual maintenance costs, Obtain the regression equation for costs related to 
age. 


Age of cars Н 2 4 6 8 
(in years) 

Maintenancecost : 10 20 25 30 
(in hundveds of Rs.) 


[C.A. (Intermediate), May, 1981) 
Ans. x : Age; у: Cost; у=3'25х+5 


M. The following table gives the ages and blood pressure of 10 
women. 


Age (X) 56-42) 36 47 249 42 60 72 63 55 


Blood 
pressure (Y) 147 125 118 128 145 140 155 160 149 150 


(i) Find the correlation coefficient between Xand Y. 
(ii) Determine the least square regression equation of Y on X. 
(iii) Estimate the blood pressure of a woman whose age is 45 years. 
(Delhi Uni. M.B.A. 1976) 
Ans. (ii) r—0:89 (10) Y-83:7584-111X 
(iii) When X—45, Y=134 


12. A panel of two judges P and Q graded seven. dramatic performances 
by independently awarding marks as follows : 

Performance 1 2 3 4 5 6 7 

Marks by P 46 42 44 40 43 41 45 

Marks by 0 40 38 36 35. 39 37 41 

The eighth performance, which Judge О could not attend, was awarded 


37 marks by Judge P. If Judge Q had also been present, how many marks would 
be expected to have been awarded by him to the eighth performance ? 


[Delhi Uni. B. Com. (Hons.) 1975) 
Ans. 33:52:34 
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13. A department store gives in service training to its salesmen which 
is followed by a test. It is considering whether it should terminate the services 
of any salesman who does not do well in the test. The following data give the 
test scores and sales made by nine salesmen during a certain period : 


Test scores vU AD 2A 210 2261852200 415,220 19 
Sales ('00 Rs.) : 31 36 48 37 50. 45.33 :41 39 
Calculate the coefficient of correlation between the test scores and the 
Sales. Does it indicate that the termination of services of low test scores is 


justified ? If the firm wants a minimum sales volume of Rs. 3,000, what is the 
minimum test score that will ensure continuation of setvice ? 


(С.А. Intermediate, Nov. 1974) 
Ans. — r—0:9476 ; Yes ; 14 


14. The followin; 
first six months of life : 


Age in months : 0 2 3 5 
Weights in Ibs. : 5 7 8 10 


Estlmate the weight of a baby at the age of 4 months. 
(Punjab Uni. B. Com. II, Sep. 1981) 


g table gives the normal weight ofa baby during the 


6 
12 


Ans. 9:2982 Ibs. 


15. The correlation coefficient between the variables x and у is ғ=0:60. 
If ox=1'50, су=2:00, x—10 and y—20, find the equations of the regression lines: 


(n »onx; (ii) x ony 
[Delhi U. В.А. (Econ. Hons. I) 1984 ; I.C.W.A. (Final) December 1977] 
Ans. y=0°8x+12 ; х=0:45у+1 


16, Estimate the loss in production in a week when the number of 
workers on strike is 1800, from the following data : 


Mean No. of workers on strike —800 
Mean loss of daily production in °000 Rs.—35 
Standard deviation of No. of workers on strike— 100 
s.d. of loss of daily production in '000 Rs.—2 
Coefficient of correlation between No. of ) 
workers on strike and daily production loss ) =0°8 


(Bombay U. B.Com. April 1982) 
Ans. Rs. 3,06,000 (Assuming 6 days’, week). 


„_ 17. For a bivariate data the mean value of X is 20 and the mean value 

RT 1545. The regression coefficient of Y on X is 4 and that of X on Y is 1/9. 
(i) the coefficient of correlation. 

(it) 


the standard deviation of Y if the standard deviation of У is 12. 
(iii) 


Also write down the equations of regression lines. 
[C.A. (Intermediate) (N.S.) November 1982] 


Ans. (i) 0°67, (ii) ox=2, (iii) Regression=ns of yon x and xony 
are respectively : y=4x—35, 9x=y+135 
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18. In a correlation study, the following values are obtained : 


X x 
Mean 65 67 
Standard deviation 25 3:5 
Coefficient of correlation 0'8 


Find the two regression equations that are associated with the above 
values. [Delhi Uni. B. Com., (Hons.), 1979 ; Gauhati Uni. B.Com., Oct. 1978) 

Ans. y=1712x—5'8 ; x=0'57y+26'81 

19. From the following data, find out the probable yield when the 


rainfal] is 29” Rainfall Yield 
Mean 257^ 40 units per hectare 
Standard deviation 3c булу Saye) ann 


Correlation coefficient between rainfall and production—0:8 
[Osmania U. B.Com. (Hons.) April 1983] 


Ans. 33:6 (units per hectare). 
20. The following results were worked out from scores in Statistics 
and Mathematics in a certain examination : 


Scores in Statistics Scores in Mathematics 


eo (0) 
Mean 39-5 4T-5 
Standard deviation 10:8 17-8 


Karl Pearson's correlation coefficient between X and Y= +0'42. 
Find both the regression lines. Use these regressions and estimate the 

value of Y for X—50 and also estimate the value of X for Y—30. 
[Delhi U. B.Com. (Hons.) 1981] 
Ans. y—0:692x 4-207166 + x=0°255y+27°3875 ; when x=50, ye5477; 

when у= 30, x—35:04 
21. Find out the regression equation showing the regression of cap- 
city utilization on production from the following data : 


Average Standard deviation 
Production. 356 10:5 
(in lakh units) 
Capacity utilization 1 
(in percentage) 84:8 5 85 


Estimate the production when the capacity utilization is 7095. 

[Allahabad U. M.B.A. 1982 ; Himachal Pradesh Uni. M.B.A. 1976 ; 
Ans. 2426468 Delhi Uni. M.B.A. Dec. 1980] 

22. Find out the regression coefficients of Y on Xand Хоп Yon the 

basis of the following data : 
EX=50, ¥=5; | IY-—60, Y=6, 
EXY—350, variance of X—4, variance of Y=9. 
[Delhi Uni. B.A. (Econ. Hons.), 1981] 


Ans. byg=1'25,  bey=0"56 
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23 From the following information, calculate the regression 


equation of X on Y and Y on X : F 
i УХ=30, ZY-40 zXY—214, 5Х*=220, Zy—340, N: ^ 
xs i [Delhi Uni. (B.A. Econ. Hons. I) (О.С), 1983) 


Ans. X=—130Y+16'4 ; Y=—065X+ 1r9 


24. Find the regression equation of X on Y and the coefficient of 


i he following data : 
correlation from t Ys eyes, ЕХ ЗАД 
an zYs-1,:20 N=10 
менш [C.A. (Intermediate) November 1985] 


Ans, X=0°58¥+3°68 ; r=0°37 - 


9.5. To find the Mean Values (7, y) from the Two Lines of 
Regression. Let us suppose that the two lines of regression are : 


ax by c,—0 ++(9.42) 

and GsX-- b;y 4- c, —0 (9.43) 
We have already discussed that both the lines of regression 

pass through the point (X, y). In other words, (%, y) is the point 


of intersection of the two lines of regression. Hence, solving (9.42) 
and (9.43) simultaneously, we get 


x y 1 


byCo— bsc eiaa — csa, T aıba— asb, 

Thus the mean values (x, y) are given Ьу: 

bica— baci ya C102 6201 
азо} * a,b, —a,b, 


Wr 


= (9.44) 


9.6. To Find the Regression Coefficients and the Correla- 
tion Coefficient from the Two Lines of Regression. Let (9.42) and 
(9.43) be the given lines of regression and let us suppose that (9.42) 
is the line of regression of y on x and (9.43) is the line of regression 
ofxony. To obtain Бу», the coefficient of regression of y on z, 
write the regression equation of y on x in the form yv=a+bx, Then 
b, the coefficient of x gives the value of bys. Similarly to obtain 
bzy, write the equation of regression of x on y in the form x=A+ 
By. Then B, the coefficient of y gives зу. Therefore, re- writing 
(9.42) we get the regression equation of y on x : 


esc Ев SPEC UN 
Pigs b, * b, A s=— D, ...(9.45) 


= Similarly re-writing (9°49), we get regression equation of x on 
yas: 


EE tin oea SET 
xad aj) E а, > аа ...(9,46) 


‚ The correlation coefficient г between x and y can now be 
obtained by using the formula 
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22 bye bs -(-#) (2) 2% 
r y: v b, x аз DU 


> r=+ Abs _ D 
a (9:47) 
the sign to be taken before the square root is same as that of the 
regression coefficients. If regression coefficients are positive, we 


те sign and if they are negative, we take negative sign in 


Remarks: Given the two lines of regression (9.42) and (9.43) 
how to determine which is the line of regression of y on = and which 
is the line of regression of x on y? Incidentally, the above discus- 
sion enables us to answer this question. By supposing (9.42) and 
(9.43) to be equations of the lines of regression of y on x and x on 
y respectively, we can obtain bys and bzy and hence r°. If г°<1, 
our supposition, i.e., (9.42) is equation of regression of y on х and 
(9.43) is equation of regression of x on y is true. However, if r? 
comes out to be greater than 1, then our supposition is wrong be- 
cause r? must lie between 0 and 1. In this case we shall conclude 
that (9.42) is the equation of regression х on у and (9.43) is the 
equation of regression of y on x. 


Example 9. 8. The lines of regression of a bivariate population 
8x—10y+66=0 КЫ) 
40х—18у=214 tt) 
The variance of x is 9. Find 
(i) The mean values of x and y. 
(ii) Correlation coefficient between x and y. 
iii) Standard deviation of y. 
[LC.W.A. (Final) June 1977 ; С.А. (Intermediate) November 1977; 


Punjabi U. M.A. (Econ.) 1979 ; A.LM.L (Diploma in Manage- 
ment) July 1981) 


Solution. (i) Since both the lines of regression pass through 
the mean values, the point (5, y) must satisfy (*) and (**). Hence 


we get” 8x—10y4-66—0 e) 

40x—18y—214-—0 2) 
Multiplying (1) by 5 we get 

40x —50y--330—0 (3) 


Subtracting (3) from (2) we get : 


qg-54 = у= 5-17 
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Substituting in (1) we find 
8x—10x 174-66—0 
82=170—66=104 = 


ix ana 


Hence the mean values are given by: X=13, y-17. 


) Let us suppose that (*) is the equation of line of regres- 


Gi і t ў 
sion of y on x and (**) is the equation ofline of regression of z 


on у. 
Rewriting (*) we get 
8 66 
10y— 8z--66 -* oye art "on 
8 4 
55 byz=Coefficient of regression of y on x= E 
Similarly, rewriting (**), we get : 
18 8 214 
40x=18y+214 = Say Oe ane 
Я 18 
nn bsy= Coefficient of regression z on у= 10 


go ж Эй M 
Hence r?=bys.bzy= 107 40 = 755 


9 3 
= —— =+—=40.6 
peer 25 + 5 +0 


,,, Since both the regression coefficients are positive, must be 
Positive. Hence we take r—0.6. 
(ii) We are given : 
ox?=9 > cx=43. 


But since standard deviation is always non-negative, we take 
ox=3, 
We have : 
_ roy 4 3 ау E 
m exu Кс е eut 


Remarks 1. Itcan be verified that the values of ¥=13 and 
Y=17 as obtained in part (i) satisfy both the equations (*) and (**), 
In numerical problems of this type, this check should invariably be 
applied to ascertain the correctness of the answer. 


2. If we had assumed that (*) is the equation of the line of 
regression of x on у and (**) is the equation of line of regression 
of y on x, then rewriting (*) and (**) we get respectively : 

8x—10y—66 and 18y—40x—214 


4 
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10 66 12540 22014 
> x= % and у= qg* ЕТ 
> b= and byz= ы 
2 ob ЛО 74025 
SE ri —byy. by — O8 77. =2.78 


But since г? always lies between 0 and 1, i.e., since г? cannot 
exceed 1, our supposition that (*) is line of regression of z on y and 
(**) is the line of regression of y on x is wrong. 


Example 9.9. Find the mean values of the variables X and Y 
and correlation coefficient between them from the following regression 


equations: jy — у уду 
3Y—2X—10-0. 
[Delhi U. B. Com. (Hons.) 1983} 
Solution. We are given the two lines of regression as : 
2y — x—50=0 : (i) 
3y—2x—10=0 (й). 


Since both the lines of regression pass through the point (®, 7), 
the mean values of the variables x and y are obtained as the point 
of intersection of (i) and (ii). Multiplying (i) by 2 and subtracting 
(ii) from it we get 

4y—2x—100=0 


3y—2z— 10=0 
= rec: 
y—90=0 
> у=90 

Substituting in (i) we get 
x=2y—50=2x 90—50=130 

Hence the mean values of x and y are given by : 

¥=130 , y=90 
To obtain the coefficient of correlation ray, let us suppose that 


(i) is the line of regression of y on x and (ii) is the line of regres- 
sion of x оп у. Rewriting (i) and (ii) we get: 


1 1 

2y=x+ 50 ET) x+25 > bus 

3 30 

and 2x=3y—10 = cR —5 > bay 


| 3 y3 
trov= + Vbye.bev = E X = $9. 
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=+ = =+0.866 


Since both the regression coefficients are Positive, гу must be 
positive. Hence we take r«y— 0.866. 


that it cannot exceed }. Ifk=1/16, find the means of the two varia- 
bles and coefficient of correlation between them. 


U.C.W.A. (Final) June 1984] 
Solution. Line of regression of x and y is: 


x=4y+5 > ры=4 
Line of regression ofyonxis: 


y—kz-4 - byz=k 
SO MP Буг, bey=4k vef) 


But we know that 


9 > 0<4k< 1, > O<Sk<1/4 


as required. 


I b ES » then from (*) we get : 


1 1 1 
2 = = ы 
r=4x 16 = > r=+ XT =+0'5 


But since both the Tegression coefficients are positive, r must 
be positive, Hence we take r=0.5 


For ko, the two lines of Tegression become 
x=4y+5 and у=& +4 
> х= 4у—5=0 00) 
апа x—16y4-64—0 S) 
(x, y) is the point of intersection of the lines (*) and (**) 
Thus, we have to solve (*) and (**), 
Subtracting (**) from (*) we get 
12y —69—0 > y= 39-575 
Substituting in (*) we get 
2=4у+5=4х 5.754-5=234+5—28 


Hence the mean values of the variables are = 28, y=5.75 


эуендер 


me ct de 


einst 
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Example 9.11. Jf the two lines of regression are : 


4x—Sy+30=0 and 20x—9y—107=0, 
which of these is the line of regression of x опу. Find ray, and oy 
du de - [Punjab Uni. M.4. Econ., 1980] 
We are given the regression lines as : 

4x—Sy+ 30-0 9 
20x—9y—107=0 (00) 


Let (i) be the equation of the line of regression of x оп у and 
(ii) be the equation of the line of regression of y on x. 


From (i), we get 


Solution. 


and 


5 30 ША! 
ЖД fo re > bey= 4 
From (ii), we get 
20 107 25:20; 
Now, vec Ec ere um 


20:075 
r=bye . bzy = qo genT, 


Since r?>1, our supposition is wrong. 
Lis we always һауе 0<r?<1) 


Hence, (i) is the line of regression of y on x and (ii) is the uag 
of regression of x on y. 


Rewriting (i), we get 
5y=4x4+30 > у= t x+6 
bys— Coefficient of regression of y on x= 4 
Similarly, rewriting (ii), we get 


9 107 
202 —9y-4-107 = 


"eat 50 
9 
Беу = Coefficient of regression of x on y= 75 
4 
E =036 | 
тё r— 4/036 — 0.6 


. Since both the regression coefficients are positive, r must be 
Positive, Hence we take r—rzy—0.6. 


We are given, o2=3. 


We have b-—r-7* > oy Dre = Se 
сх 
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Example 9.12. Comment on the following results obtained 
from given data 
For a bivariate distribution : 
Coefficient of regression. of Y on X is 4.2 and 
Coefficient of regression of X on Y is 0.50, 
(.C.W.A. Final, Dec. 1977) 


Solution. We are given that 


byz—4.2 and bzy=0.5 
We have 


r!—byz ‚ bay=4.2x 0.5—2.1 071 


But we know that 7 cannot exceed unity numerically, i.e., 
—I&r«1 > 1 


Hence the given Statement is wrong. 

9.6. Standard Error of an Estimate. The regression equations 
enable us to estimate (predict) the value of the dependent variable 
for any given value of the independent variable. The estimates so 
obtained are, however, not Perfect. A measure of the precision of 
the estimates so obtained from the regression equations is provided 
by the Standard Error (S.E.) of the estimate. Standard error isa 
word analogous to standard deviation (which is a measure of dis- 
persion of the observations about the mean of the distribution) and 


Sy; —S.E. of estimate of Y for given x 


= 4 a 3» .. (9.48) 


where y, is the estimated value of y for given value of x obtained 
rom the line of Tegression of y on x, 


Similarly, ; 
Say=S.E. of estimate of x for any given y 


клды 949 


The computation of standard error of estimates by above for- 
mulae is quite tedious as it Tequires the Computations of the error 
of estimates y—y, for each x and z—x, for each У. However, а 
much more convenient formula for numerical Computations is given 
below. Sys =0,(1—p2)1/2 +++(9.50) 

Soy=o,(1—r2)t/2 (9.51) 


вав == [шу is the correlation Coefficient between the two variables 
x and y. 
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Age: 1 2-2 РЕС RSS Е аа 9 10 
Weight: 52.5 58.7 65.0 70.2 75.4 811 87.2 95.5 102.2 108.4 
Also obtain the standard error of estimate of growth. 


Solution. Let the random variable Y denote age (in weeks) 
and Y denote the weight. We аге give N—10. 
CALCULATIONS FOR REGRESSION EQUATION 


| 


РА x xy 

1 52:5 1 52:5 
2 58:7 4 1174 
3 65.0 9 195-0 
4 70:2 16 280:8 
5 75:4 25 377:0 
6 811 36 486:6 
7 87:2 49 610-4 
8 95-5 64 764-0 
9 102-2 81 919-8 
10 108-4 100 1084-0 
Total 55 796:2 385 4887:5 


Let us consider the regression equation of y on x, viz., 


y-—a- bx. 4%) 
Constants ‘a’ and ‘b’ are given by : [c. f. (9.7) and (9.8)] 
= Ge) (Sy)— (2x) (Улу) _ __385х796.2—55х4887.5 
A NZxi—(Xxy x 10x 385— (55): 
306537—268812.5 37724,5 — 

= 8850-305  —- 825 —— 473 

b= М№Ёху— (Ух) (Ху) _ 10x48875—55x7962 
TUNIS 10x 385— (55) 
—58875-43791 — 5084 _ Af 

-3850—3025 = 825 


Substituting these values of aand b in (*), the equation of 
the line of regression of y on x becomes : 
у=45.73-+-6.16х (жж) 


The weights of the calf after 1, 2, 3... weeks as given by the 
regression equation (*) аге: a+b, а+2Ь, a+3b,...Hence the aver- 
age rate of growth per week is b units, i.e., 6.16 units. 
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Standard Error of Estimate. From regression equation we 
have: When x=1,  y=45.73+6.16=51-89 


when x=2, y=45.73 +6-16 x 2— 58.05 
and so on. 


COMPUTATION OF S.E. OF ESTIMATE 


od 


x У Уг У-У, (y—Ye)* 
1 525 51:89 0:61 0:3721 
2 58:7 58°05 0-65 0:4225 
3 65:0 64°21 0:79 0:6241 
4 70:2 70:37 —0:17 0:0289 
5 754 76:53 =113 1:2769 
6 811 82:69 —r59 2:5281 
7 872 88.85 —1°65 2:7225 
8 95°5 95:01 0:49 0:2401 
9 102:2 101-17 1:03 1-0609 
10 108-4 107-33 1:07 171449 


Z(y— ye)” =10:421 


. Standard Error of estimate of Y (for any given X), i.e., Sys is 
given by 


1 1 
S= | FEO) =, 19 X 10.421 =V L0421— 1.02 


9.7. Regression Equations for a Bivariate Frequency Table. The 
computation of correlation coefficient r for a bivariate frequency 
table, commonly known as correlation table, has been discussed in 
chapter 8. The calculation of rinvolves the computation of x,y, 
ox, cy. Since the equations of the two lines of regression, viz., line 
of regression of y on x and x on y are respectively : 


y—y-bys (s )-22 (»- ) 


ls rox aS 
x—X= bay (»-7 )- 5 (>= ) 
the calculations for obtaining these equations will be more or less 
same. However, it may be remarked here that the regression 
coefficients bye and bzy are independent of change of origin but not 
of scale, i.e., if we take 
Hie ling уы у Veg 


h k 


u= 


k h 
t к==р EROS 
hen by. bw and bzy E b 


d фе ш 
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This point has to be borne in mind in computi 
$ ‹ : | puting the гергез- 
Sion coefficients. We will explain the technique P e. of 
examples. 


Example 9.14. Family income and its percentage spent on food 
in the case of hundred families gave the following bivariate frequency 
distribution. 


Food Expenditure Family income (Rs.) 
(in%) 200—300 300—400 400—500 500—600 600—700 
10—15 — — — 3 7 
15—20 — 4 9 4 3 
20—25 7 6 12 3 x" 
25—30 3 10 19 8 = 


(a) Obtain the equations of the two lines of regression. 
(b) Also compute the Standard Error of the estimates. 


Solution. Using the calculations of Example 8.14. 
we get : 


ay hXfu = 100x0 — 
х=4+ N POE 450 
Еа 5X 100 _ 
y=B+ mU NT =17.5+ ~oo 7 = 22:5 


sek [ Ni fw—(Sfu) X(Zfi) 
EAU O PRES ia 


_5 [Г 100x(—48)—0x 100 


=100 100 x 120 
1 —4800 2 
= 20: < =з 5:89:02 
MET AL NZfu-—(Xfu)(Zfv) ] 
P SK Муу (Хуу) 
Ж T —4800 ]+ 20x (—4800) 
~ 5L 100x200—(100? |^ 10000 
48 
——---96 


y—y-—bys(x—x) x—X-—byz(y—y) 
> у—22.5= —0.02 (x—450) | = x—450——9.6 (у—22.5) 
> ›=—0.02 x+9+22.5 | = ж=—9.6 y+2164+450 
> y=—0.02 x4-31.5 > x=—9.6 y+666 


Line of Regression of y onx | Line of Regression of x on y 


512 Business Statisties 


Standard Error of Estimates. We have : 


100 
es VE Iff = ох V TXX100— O 
=V 12000 109.545 
ga + VNI- S = 37100005 
72 == Бух, bry=(0.02) x (—9.6)—0:192 
S.E. of estimate of y on x is given by : 
Syz—aoy (1—7)/2—5(1—0.192).2 
7:5V/0.808 —5x 0 899—4.495 
S.E. of estimate of x on y is given by : 
Szy— az (1 —r*)1/2= 109.545 x 0.899— 98.48 


9.8. Correlation Analysis Vs. Regression Analysis. 


l. Correlation literally means the relationship between two or 
more variables which vary in Sympathy so that the movements in 
one tend to be accompanied by the Corresponding movements in the 
other(s. On the other hand, regression means stepping back or 
returning to the average value and isa mathematical measure ex- 
pressing the average relationship between the two variables, 


Regression analysis aims at establishing the functional relation- 
between the t 1 


3. Correlation need not imply cause and effect relationship 
between the variable under Study. [For details see 58.1.2, р. 513. 
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4. Correlation coefficient rzy is a relative measure of the linea 
| relationship between x and y and is independent of the units ој 
} measurement. It is a pure number lying between +1. 


On the other hand, the regression coefficients, byz and bzy are 
absolute measures representing the change in the value of the varia- 
ble y (x), for a unit change in the value of the variable x (y). Once 
the functional form of regression curve is known, by substituting the 
| value of the dependent variable we can obtain the value of the in- 

dependent variable and this value will be in the units of measure- 
ment of the variable. 


| 5. There may be non-sense correlation between two variables 
| which is due to pure chance and has по practical relevance, e.g., the 
correlation between the size of shoe and the intelligence of a group 
of individuals. There is no such thing like non-sense Tegression. 


6. Correlation analysis is confined only to the study of linear 
relationship between the variables and, therefore, has limited appli- 
cations. Regression analysis has much wider applications as it 
Studies linear as well às non-linear relationship between the 
variables. 


EXERCISE 9.2 


1. The equations of two lines of regression obtained in a correlation 
analysis are the following : 


З 2X—8—3Y and 2Y-5— x. 
| Obtain the value of the correlation coefficient. 
d [Delhi U. В.А. (Econ, Hons. I) 1985] 


| Ans, r= —0°866. 
2. You are supplied with the following data : 
4x—5y+33=0 


20x—9y—107—0 
Variance x=9 
Calculate : 
(i) the mean values of x and » 
(ii) standard deviation of » 
(iii) coefficient of correlation between x and y. 


[Bombay U. B. Com. June 1981, May 1980 ; Punjabi U. M.A. (Econ.), 1982 : 
4.1.M.A. (Diploma in Management), Jan. 1982] 


Ans. (i) 8—13, 3—17, (ii) oy=4, (iii) ray=0-6 
3. The lines of regression of yon x and xon y are у=0:3х4-10:0 and 


x—12y4-08, respectively. Determine the means of x and у, the ratio of the 
standard deviations of x and у, and the correlation between x and y. 


[Delhi U. B.A. (Econ, Hon, 1) 1983] 
Ans. #=20, 3—16 ; rs, 0:6, 2% —5. 
oy 
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4. Regression equations of two variables X and Y are as follows : 
3X--2Y—26—0 0 
6X+ Y—31=0 ..(**) 

Find : 

(i) the mean of X and Y, 

(ii) the regression coefficient of Х on Y, 

(iii) the coefficient of correlation between X and `, 
(iv) the most probable value of Y when Х=5. 
(у) oy if са2=25. . 
[Delhi U. В.А. (Econ. Hons.) 1980 ; C.A. (Intermediate) November 1983] 
‘Ans. %=4, y—1, (ii) bay —1/6, by, ——3/2, (iii) ray=—0'5 (iv) 5:5, 
(v) 15. 
5. Find the means of variables x and y and correlation Coefficient, 
given the following : 

Regression equation of y on x : 2y—x—50—0 (9) 

Regression equation of x on y : 3y—2x—10=0 (tt) 

[Delhi Uni, B. Com. (Hons.), 1983 ; Bombay Uni. B. Com. 1977] 

Ans, X130, 9=90, 7,,—0:87. 


6. Out of the two lines of regression given by 
x+2y—5=0 and 2x+3y—8=0 
Which one is the regression line of x on y and y on x. 
Ans. Reg. of yon x: x+2y—5=0 ; Reg. of x on y : 2x+3y—8=0 
7. The two regression lines are given by 
3x -5y—42—0 and 2x-+y—80=0, 
Obtain estimates of y when x=10 and of x when y=20 
(Bombay U. B.Com., Nov. 1980) 
Ans. O19 724; 69,29 730 


8. The equations of two regression lines between two variables are 
expressed as 2x—3y=0 and 4y—5x—8—0. 


(i) Identify which of the two can be called regression of yon x and of 
xon y. 
(ii) Find x and y and correlation coefficient (r) from the equations. 
[I.C.W.A. (Final), June 1983] 
Ans. (i) Reg. of y on х: 2x—3y=0; xony: 4y—5x—8=0 
(ii) ®= —3:43, у= —2:29, (iii) r=073. 
em 9. For 50 students of a class the regression equation of marks in Stat- 
istics (x) on marks in Accountancy (у) is 3y—5x--180—0. The mean marks in 
ccountancy is 44 and variance of marks in Statistics is 9/16th of the variance 
of marks in Accountancy. Find the mean marks in Statistics and coefficient of 
correlation between marks in two subjects. 
Ans, &—62:4 ; rey=0'8 


2.29. For fifty students of a class the regression equation of marks in 
Statistics (y) on the marks in Accountancy (х) is 4y—5x—8—0. Average marks 
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zi Accountancy are 40. . The ratio of the standard deviations oy : ox is 5:2, 
ind the average marks in Statistics and the coefficient of correlation between 
the marks in two subjects. 


Ans. 3—52, rey=0°5, 


п. Regression of savings (S) of a family on income (Y) may be expressed 
as S=a+ i „ Where a and m are constants. In random sample of 100 families 


the variauce of savings is one-quarter of the variance of incomes and the correla- 
tion is found to be 0'4. Obtain the estimate of m. 


С.И. А. (Final) June 1974] 
Ans, m=5 


12. What do you mean by Standard Error (S.E.) of an estimate ? Give 
expressions for the S.E. of estimate of y for given x and S.E. of estimate of x for 
given y, assuming linear regression between x and y. 


13. (i) Define the standard error of cstimate of the linear regression of 
Y on X, and express it in terms of the correlation coefficient, r. 


(Qi) What is tie 


tandard error of estimating Y from X if r=1? 
Р [Delhi U. B.A. (Econ. Hons.) 1981] 


14. Given the standard deviations sz and ay for two correlated variates x 
and y in a large sample : 

(a) What is the standard error in estimating y from x if r=0 7 

(b) By how much is the error reduced if r is increased to 0:5 ? 

(c) What is the standard error in estimating y from x if r=1 ? 

Ans. (а) Syz=cy (Б) 4c, (с) Sy,—0 


3 15. Obtain the lines of regression for the following bivariate frequency 
distribution ; 


Advertisement Expenditure (’000 Rs.) 
Sales Revenue 5—15 15—25 25—35 35—45 Total 
*000) 


Rs. ( 

75—125 4 1 д a 5 
125—175 7 6 2 1 16 
175—225 1 3 4 2 10 
225—275 П 1 3 4 9 
Total 13 11 9 7 40 


(Delhi Uni. М.В.А. 1973) 
Ans. Lines of regression are : 


X—0:134y—1:45 and y=2°65x+119-13 where x denotes advertisement 
expenditure and y denotes sales. 
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(OBJECTIVE TYPE QUESTIONS) 


1. If two regression coefficients are 0:8 and 1°2, what would be the value 
‘of the coefficient of correlation ? [Mysore Uni. B.Com., 1976] 


2. Given b,,— —1:4 and bey=—0°5, calculate Tay. 
Ans. rey=—0°84 


3. (а) Comment on the following : 


For a bivariate distribution, the coefficient of rej i is 4° 
and coefficient of regression of x on yis0'5. EO mons i42 
U.C.W.A. (Final) December 1977] 


b) If two regression coefficients are 0:8 


and 0'6, 
value о the coefficient of correlation ? ЖО be the 


[Rajasthan Uni. M.Com., 1976] 


(c) A student while studying correlation between smoking and drinking 


found a value of r—2:46. Discuss. 
(d) For a bivariate distribution 
byn=2'8 ; bay —0:3 
Comment. 


Ans. (a) r*—42x0:5—2:17]1. Statement is wrong. (b) 0'69 (c) Wrong, 
since—1<rq1, (d) Wrong, since both the regression coefficients must have the 
same sign. 


4. With bey=0'5,r=0°8 and variance of Y=16, the standard deviation 
of X equals to... 
(a) 25 (b) 64 (c) 100 (d) 25:6 
Ans. ox-2:5 [C.A. (Intermediate) November 1983] 
5. Given хезон ке of x оп yand y on x as 0°85 and 0'89, 
f cient of correlation. 
а оН ит oy (n [Delhi О. В.А. (Econ. Hons.) 1982] 
6. From the following regression equations, find xand y; 
Y on X:2Y—X—50—0 
X on Y: 3Y—2X—10=0 
Ans. %=130, ӯ=90 


7. A student obtained the two regression lines as : 
2x—5y—7—0 
and 3x+2y—8=0 
Do you agree with him ? U.C.W.A. (Final) Dec. 1981] 


„Ans. No. by,—2/5, bey=—2/3. Impossible, because both the regression 
coefficients must have the same sign. 


8. Comment on the following statements : 
(i) The correlation coefficient between Х and Y (Fey) is 0:90 and the 


regression coefficient fry is —1. 
[LC.W.A. (Final) June 1979] 
(Ui) If the two coefficients of regression are negative then their correla- 
tion cocffieient is positive. U.C.W. A. (Final) Dec. 1979] 
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(iii) Tey 20:9, B,,—2:04, By,——3:2 
U.C.W.A. (Final) Dec. 1980] 


us se es p -Wrong, (ii) Wrong, (iii) Wrong [ray, Boy and Вуз must have 


„9. Discuss briefly the importance of regression analysis. Interpret the 
following values : 


(i) Product-moment coefficient of correlation is 0. 
(ii) Regression coefficient of Y on X is —1:75 


(iii) Coefficient of rank correlation=1 
[Bombay U. B.Com., April 1982] 


10. (i) “The regression equations of Y on Х and X on Y are irreversible.” 
Explain. 


(ii) “A correlation coefficient r=0°8 indicates a relationship twice as 
close as r—0:4." Comment. 


(iii) “Even a high degree of correlation does not mean that a relationship 
of cause and effect exists between the two correlated variables.” 


Why ? 
[Delhi U. B.A, (Econ. Hons. I), 1985] 


10 


Index Numbers 


101. Introduction. Index numbers are indicators which 
reflect the relative changes in the level of a certain phenomenon in 
any given period (or over a specified period of time) called the 
current pertod with respect to its values in some fixed period, called 
the base period selected for comparison. The phenomenon or vari- 
able under consideration may Бе: 


(i) The price ofa particular commodity like steel, gold, 
leather, etc., or a group of commodities like consumer goods, 
cereals, milk and milk products, cosmetics, etc. 


(i) Volume of trade, factory production, industrial or agri- 
cultural production, imports or exports, stocks and shares, sales and 
profits of a business house and so on. 


(iii) The national income of a country, wage structure of 
worker in various sectors, bank deposits, foreign exc апре reserves, 
cost of living of persons of a particular community, class or pro- 
fession and so on. 


Definition. “Index numbers are Satistical devices designed 
to measure the relative change in the level ofa phenomenon (variable 
or a group of variables) with respect to time, geographical location 
or other characteristics such as income, profession, etc.’ In other 
words, index numbers are Specialised type of rates, ratios, percen- 
tages which give the general level of magnitude of a group of 
distinct but related variables in two Or more situations. 


For example, suppose we are interested in studying the 
general change in the price level of consumer goods, i.e., goods or 
commodities consumed by the people belonging toa particular 
section of society, say, low income group or middle income group 
or labour class and So on. Obviously these changes are not directly 


are quoted in Rs. per quintal or kg.; water in Rs. per gallon ; milk, 


petrol, kerosene, etc., in Rs, per litre; cloth in Rs. per metre and so 
on. 
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Further, the prices of some of the commodities may be in- 
creasing while those of others may be decreasing during the two 
periods and the rates of increase or decrease may be different for 
different commodities. Index number is a Statistical device which 
enables us to arrive at a single representative figure which gives the 
general level of the price of the phenomenon (commodities) in an 
extensive group. According to Wheldon : 


“Index number is a statistical device for indicating the relative 
movements of the data where measurement of actual movements is 
difficult or incapable of being made." 


F.Y. Edgeworth gave the classical definition of index num- 
bers as follows ; 


"Index number shows by its variations the changes in a magni- 
tude which is not susceptible either of accurate measurement in itself 
or of direct valuation in practice." 


10.2. Uses of Index Numbers. The first index number was 
constructed by an Italian, Mr. Carli, in 1764 to compare the changes 
in price for the year 1750 (current year) with the price level in 1500 
(base year) in order to study the effect of discovery of America on 
the price level in Italy. Though originally designed to study the 
general level of prices or accordingly purchasing power of money, 
today index numbers are extensively used for a variety of purposes 
in economics, business, management, etc., and for quantitative data 
relating to production, consumption, profits, personnel and financial 
matters etc., for comparing changes in the level of phenomenon for 
two periods, places, etc. In fact there is hardly any field of quanti- 
tative measurements where index numbers are not constructed. They 
are used in almost all sciences— natural, social and physical. The 
main uses of index numbers can be summarised as follows : 


l. Index Numbers as Economic Barometers, Index numbers 
are indispensable tools for the management personnel of any govern- 
ment organisation or individual business concern and in business 
planning and formulation of executive decisions. The indices of 
Prices (wholesale & retail), output (volume of trade, import and 
export, industrial and agricultural Production) and bank deposits, 
foreign exchange and reserves etc., throw light on the nature of, 
and variation in the general economic and business activity of the 
country. A careful study of these indices gives usa fairly good 
appraisal of the general trade, economic development and business 
activity of the country. In the words of G. Simpson and F, Kafka: 


“Index numbers are today one of the most widely used statistical 
devices...They are used to take the pulse of the economy and they 
have come to be used as indicators of inflationary or deflationary 
tendencies,” 


Like barometers which are used in Physics and Chemistry to 
measure atmospheric pressure, index numbers are rightly termed as 
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economic barometers’ or ‘barometers of economic activity’ which 
Measure the pressure of economic and business behaviour. 


2. Index Numbers Help in Studying Trends and Tendencies. 
Since the index numbers study the relative changes in the level of a 
phenomenon at different periods of time, they are specially useful 
for the study of the general trend for a group phenomena in a time 
series data. The indices of output (industrial and agricultural pro- 
duction), volume of trade, import and export, etc., are extremely 
useful for studying the changes in the level of phenomenon due to 
the various components of a time series, viz., secular trend, seasonal 
and cyclical variations and irregular components and reflect upon 
the general trend of production and business activity. As a measure 
of average change in extensive group, the index numbers can be used 
to forecast future events. For instance, if a businessman is inter- 
ested in establishing a new industry, the study of the trend of chang- 
es in the prices, wages and incomes in different industries is 
extremely helpful to him to frame a general idea of the comparative 
courses which the future holds for different undertakin gs. 


3. Index Numbers Help in Formulating Decisions and Policies. 
Index numbers of the data relating to prices, production, profits, 
imports and exports, personnel and financial matters are indispens- 
able for any organisation in efficient planning and formulation of 
executive decisions. For example, the cost of living index numbers 
are used by the government and the industrial and business concerns 
for the regulation of dearness allowance (D.A.) or grant of bonus 
to the workers so as to enable them to meet the increased cost of 
living from time to time. The excise duty on the production or sales 
of a commodity is regulated according to the index numbers of the 
consumption of the commodity from time to time. Similarly, the 
indices of consumption of various commodities help in the planning 
of their future production. Although index numbers are now widely 
used to study the general economic and business conditions of the 
society, they are also applied with advantage by sociologists (popu- 
lation indices), psychologists (I. Q.’s), health and educational 
authorities etc., for formulating and revising their policies from 
time to time. 


4. Price Indices Measure the Purchasing Power of Money. The 
cost of living index numbers determine whether the real wages are 
rising or falling, money wages remaining unchanged. In other 
words, they help us in computing the real wages which are obtained 
on dividing the money wages by the corresponding price index and 
multiplying by 100. Real wages help us in determining the pur- 
chasing power of money. For example, suppose that the cost of 
living index for any year, say, 1979 for a particular class of people 
with 1970 as base year is 150. If a person belonging to that class 
gets Rs. 300/- in 1970, then in order to maintain the same standard 
of living as in 1970 (other factors Temaining constant) his salary in 
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1979 should be tay X 300—Rs. 450. In other words, if a person 


gets Rs. 450 in 1979, then his real wages are 3 x 100= 


Rs. 300. i.e., the purchasing power of money has reduced to 2/3. 


5. Index Numbers are Used for Deflation. Consumer price 
indices or cost of living index numbers are used for deflation of net 
national product, income value series in national accounts. The 
technique of obtaining real wages from the given nominal wages 
(as explained in use 4 above) can be used to find real income from 
inflated money income, real sales from nominal sales and so on by 
taking into account appropriate index numbers. 


For detailed discussion on (4) and (5) See §10.8.3. 


10.3. Types of Index Numbers. Index numbers may be broadly 
classified into various categories depending upon the type of the 
phenomenon or variable in which the relative changes are to be 
studied. Although index numbers can be constructed for measur- 
ing relative changes in any field of quantitative measurement, we 
shall primarily confine the discussion to the data relating to econo- 
mics and business i.e., data relating to prices, production (output) 
and consumption. In this context index numbers may be broadly 
classified into the following three categories : 


1. Price Index Numbers. The price index numbers measure 
the general changes in the prices. They are further sub-divided into 
the following classes : 


(а) Wholesale Price Index Numbers. The wholesale price 
index numbers reflect the changes in the general price level of 
а country. The official general purpose index number of wholesale 
prices in India was first compiled by the Economic Adviser, Minis- 
try of Commerce and Industry (now the Ministry of Commerce) in 
1947 (with year ending August 1939 as base year) and revised series 
was started in April 1956 (with 1952-53 as base year). The new 
series of index number of wholesale price (1961-62 base year) was 
started on the recommendations of ‘Wholesale Price Index Revision 
Committee", It covered 139 commodities, 225 markets and 774 
quotations. The latest wholesale price index number in India is 
constructed with 1970-1971 as base year. 


(b) Retail Price Index Numbers. These indices reflect the 
general changes in the retail prices of various commodities such as 
Consumption goods, stocks and shares, bank deposits, government 
bonds, etc. In India, these indices are constructed by Labour Mini- 
stry in the form of Labour Bureau Index Number of Retail Prices — 
Urban Centres and Rural Centres. 


Consumer Price Index, commonly known as the Cost of Living 
Index is a specialised kind of retail price index and enables us to 
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study the effect of changes in the prices of a basket of goods or 
commodities on the purchasing power or cost of living of a parti- 
cular class or section of the people like labour class, industrial or 
agricultural worker, low income or middle income class etc. In 
India, cost of living index numbers are ayailable for (i) Central 
Government employees, (ii) middle class people and (iii) working 
class. 


2. Quantity Index Numbers. Quantity index numbers study 
the changes in the volume of goods produced (manufactured), 
consumed or distributed, like the indices of agricultural production, 
industrial production, imports and exports, etc. They are extremely 
helpful in studying the level of physical output in an economy. 


3, Value Index Numbers. These are intended to study the 
change in the total value (price multiplied by quantity) of produc- 
tion such as indices of retail sales or profits or inventories. How- 
ever, these indices are not as common as price and quantity 
indices. 


104. Problems in the Construction of Index Numbers. As 
already pointed out, index numbers are very powerful statistical 
tools for measuring the changes in the level of any phenomenon 
over two different periods of time. It is, therefore, imperative that 
utmost care is exercised in the computation and construction of 
these indices. Index numbers which are not properly comp- 
iled will, not only lead to wrong and fallacious conclusions but 
might also prove to be dangerous. The construction of the index 
numbers requires a careful study of the following points which may 
be termed as preliminaries to the construction of index numbers. 


1. The Purpose of Index Numbers. The first and the foremost 
problem in the construction of index numbers is to define in clear 
and concrete terms the objective or the purpose for which the index 
number is required The purpose of the index would help in deciding 
about the nature of the statistics (data) to be collected, the statistical 
techniques (formulae) to be used and also has a determining effect 
on some other related problems like the selection of commodities, 
selection of base period, the average to be used and so on. For 
example, if we want to study the changes in the cost of living (i.e., 
consumer price index) the class of people for which the index is 
designed, viz., agricultural or industrial workers, low income group, 
middle income group, etc., should be clearly specified because the 
consumption pattern of the commodities by the people of different 
classes varies considerably. Similarly, if the objective is to study 
the general changes in the price level in a country then the price 
quotations are to be obtained from “the wholesale market and 
relatively a large number of commodities or items are to be included 
in its construction as compared with the number of items in the 
construction of cost of living index for a specified class of people. 
In the absence of the purpose of the index being clearly specified 
we аге liable to collect some irrelevant information which may 

never be used and also omit some important data or items which 
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might ultimately lead to fallacious conclusions and wastage of 
Tesources, 


2. Selection of Commodities or Items. Once the purpose of the 
index is explicitly specified, the next problem is the selection of the 
commodities or items to be used for its Construction. In the selec- 
tion of commodities the following points may be bornein mind: 


luxury items like scooter, television, refrigerator, etc., will have no 
relevance, The commodities should thus be representative of the 
habits, tastes, customs and consumption pattern of the class of 
people for whom the index is intended. 


As already pointed out, index number gives the general level 
of a phenomenon in an extensive group. It is practically impossible 
to take into account all the items in the group. For example, in the 
Construction of price index, from technical point of view we should 
study the price changes in all the items or commodities. However, 
from practical point of view it is neither possible nor desirable to 
take into account all the items. We resort to sampling and only a 
few representative items are selected from the whole lot. The ideal 
solution lies in : 


(a) Classifying the whole relevant group of items or commodi- 
ties into relatively homogeneous subgroups like (i) Food [cereals— 
Tice, wheat, pulses, grams, etc. ; milk and milk products ; fruits ; 
vegetables ; meat, poultry, and fish ;bakery products and so on]. 
Gi) Clothing, (iij Fuel and Lighting (including electrical appli- 
ances) (iv) House rent, (v) Miscellaneous (including items like edu- 
cation, entertainment, medical expenses, washerman, newspaper, etc. 


(b) Selecting an adequate number of representative items from 
each group (so that the final sample is a stratified sample and nota 
random sample). Further, within each group the more important 
items of consumption by the particular group of people are selected 
first and from among the remaining items as many more items are 
Selected as our resources (in terms of time, money and administra- 
tion) permit. Thus, even within each Stratum (sub-group) the sample 
drawn is not random 


(ii) The total number of commodities selected for the index 
should neither be too small nor too large, because if it is too small, 
then the index number will not be representative and if it is too 
large, then the computational work may be uneconomical in terms 
of time and money and may even be tedious. , The number of com- 
modities selected should be fairly enough consistent with the ease of 
handling and computation. 
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(ii) In order to arrive at meaningful and valid comparisons 
it is essential that the commodities selected for the construction of 
the index number are of the same quality or grade in different per- 
iods, or in other words they remain more or less stable in quality 
for reasonably long periods. Hence, in order to avoid any confusion 
about the quality of commodities due to time lag, graded or 
standardised items or commodities should be selected as far as 
possible, 


special reports from producers, exporters etc, or in the absence of 
all these through reliable and unbiased field agency. The basic 
principles of data collection viz., accuracy, suitability or compara- 
bility and adequacy should be kept in mind in using secondary data. 
[For details see $2.6 page 56]. Above all, the data collected must 
be relevant to the purpose of the index. For example, if we want 
to study the changes in the general price level in a country, then the 
price quotations for the Selected commodities must be obtained 


Supply the price quotations from time to time on regular basis, since 
price indices are often computed yearly, monthly and even weekly. 


(i) Base period should be a period of normal and. stable econo- 
mic conditions, i.e., it should be free from all sorts of abnormalities 
and random or irregular fluctuations like earthquakes, wars, floods, 
famines, labour strikes, lockouts, economic boom and depression, 
For instance, if the base Period is taker as а Period of economic 


boom so that prices of various £oods and commodities are very low 
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then the index will be over-stated while if the base period is a period 
of depression or economic instability, so that the prices of consump- 
tion goods are abnormally high, then the index will be under-stated, 
However, the selection of a strictly normal period is not an easy 
job. A period which is normal in one Tespect may be abnormal in 
some other respect. Accordingly, sometimes an average of two or 
more years is taken as base period and the average prices and quan- 
tities of the commodities consumed in these years are taken as base 
year prices and quantities. 


(ii) The base period should not be too distant from the given 
period. Due to rapid and dynamic pace of events these days it is 
desirable that the base period should not be very far off from the 
current period because the comparisons are valid and meaningful 
if they are made between two periods with relatively familiar set of 
circumstances. If the time lag is too much between the current and 
the base periods then it is very likely that there may be an appreci- 
able change in the tastes, customs, habits, and fashions of the 
people, thereby, affecting the consumption pattern of the various 
commodities to a marked extent. It is also possible that during 
this long period some of the goods or commodities consumed in the 
base year have become obsolete or outdated and have been replaced 
by new commodities of better quality. In such situations compari- 
son will be very difficult to make. Keeping this point in view the 
base period in the Economic Adviser's Index Number of Wholesale 
Prices in India has been recently shifted from 1960-61 to 1970.71. 
Similarly for the grant of dearness allowance (D.A.) increment to 
the workers, the prices should be compared with the period when 
last D.A. was granted or announced. 


(iii) Fixed Base or Chain Base. If the period of comparison is 
kept fixed for all current years, it is called fixed base period. How- 
ever, because of the points raised in (ij) above, sometimes chain- 
base method is used, in which the changes in the prices for any 
given year are compared with the prices in the preceding year. 
[For detaits see Chain-Base Index Numbers discussed later in this 
chapter.] 

5. Typeof Average to be Used. The changes in the prices 
of various commodities have to be combined to arrive at a single 
index which will reflect the average change in the price level of the 
commodities in the composite group. This is done by averaging 
them. Since index numbers are specialised averages, a judicious 
selection of the average to be used for their construction is of great 
importance. The commonly used averages are : 


(i) Arithmetic Mean (A.M.) 
(ti) Geometric Mean (G.M.) 
(iii) Median. 


Median is the easiest of all the three to calculate but since it 
completely ignores the extreme obscrvations and is more affected by 
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a few middle items, it is seldom used. Arithmetic mean is also not 
recommended theoretically as it is very much affected by extreme 
observations. However, from the theoretical considerations geome- 
tric mean is the most appropriate average jn this case because : 


(i) In index number we deal with ratios and relative changes 
and geometric mean gives equal weights to equal ratios of change. 
(С.М. of ratios Ratio of G.M.'s] For example, if the price of 
а commodity is doubled and that of the other is halved then the 


geometric mean is not affected while the arithmetic mean will show 
an increase of 25%, 


Gi) It gives more importance to small items and less Пара 
tance to large items and is, therefore, not unduly affecte by 
extreme and violent fluctuations in the observations, 


(iii) Index numbers based on geometric mean are reversible 


(See Time Reversal Test]. 


Hence from theoretic considerations, for the sake of greater 


‚ geometric mean should be preferred. 


ding to their relative importance in the 
of index numbers, 
(0 Unweighted Index Numbers, 


tracted without assigning any weights to 
Un-weighted index numbers, 


The index numbers cons- 
different items are called 


also be looked upon as weighted index 
each commodity is unity. 
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figures or distribution figures may be taken as weights. The most 
commonly adopted systems of weighting are : 


(D) Quantity weights in which the various commodities are 
attached importance according to the amount of their quantity 
used, purchased or consumed and 


е (ii) The value weights in which the importance to the various 
items is assigned according to the expenditure involved on them. 


The choice of different systems of weighting w.r.t. the quanti- 
ties consumed or the total values in the base year or the current 
year or sometimes their arithmetic or geometric crosses gives rise to 
a number of formulae for the construction of index numbers, dis- _ 
cussed in $ 8'5 and very much depends-on the purpose of the index _ 
and availability of the data. 


Regarding the system of weighting to be adopted for construc- 
ting index numbers, it is worth while to quote the words of A.L. 
Bowley : 


“The discussion of proper weights to be used has occupied a 
space in statistical literature out of all proportions to its significance, 
for it may be said at once that no great importance need be attached 
to the special choice of weights ; one of the most convenient facts 
of statistical theory is that, given certain conditions, the same result 
is obtained with sufficient closeness whatever logical system of 
weights is applied." 

However, he is not totally against the weighting and suggested 
the arithmetic cross of Laspeyre's and Paasche's formulae discussed 
in next section $ 8.5.2. 


7. Choice of Formula, The choice of the formula tobe — 
used depends on the availability of the data regarding the prices a J 
the quantities of the selected commodities in the base and/or curr < 
year. Before discussing various formulae, we give below notations — — 


and terminology. 


Notations and Terminology 


Base Year. The year selected for comparison, ie., the yaar : 

w.rt. which comparisons are made. It is denoted by ће sufix _ 

zero ‘0° 4 

Current Year. The year for which comparisons are sovghtijor 

required. It is denoted by the suffix 1. 

Po: Price of a commodity in the base year. 

ру: Price of a commodity in the current year. 

go: Quantity of a commodity consumed or purchased during 
the base year. t 
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a few middle items, it is seldom used. Arithmetic mean is also not 
recommended theoretically as it is very much affected by extreme 
Observations. However, from the theoretical considerations geome- 
tric mean is the most appropriate average in this case because : 


(i) In index number we deal with ratios and relative changes 
and geometric mean gives equal weights to equal ratios of change. 
(С.М. of ratios=Ratio of G.M.'s] For example, if the price of 
a commodity is doubled and that of the other is halved then the 


(ii) It gives more importance to small items and less impor- 
tance to large items and is, therefore, not unduly affected by 
extreme and violent fluctuations in the observations. 


(iii) Index numbers based on geometric mean are reversible 
[See Time Reversai Test]. 


Hence from theoretic considerations, for the sake of greater 
accuracy and precision, geometric mean should be preferred. 
owever, in practice, because of its computational difficulties, 


mean each of which is commonly used in Practice but gives 
different figures for the index. 


6. System of Weighting. The commodities included for the 
construction of index numbers like food, clothing, housing, light 
and fuel, etc., are not of equal importance. In Order that the index 
is representative of the average changes in the level of phenomenon 
for the composite group, proper weights should be assigned to diffe- 
Tent commodities according to their relative importance in the 
group. Thus, in practice, we may have two types of index numbers, 


(i) Un-weighted Index Numbers. The index numbers cons- 


tructed without assigning any Weights to different items are calfed 
Un-weighted index numbers. 


Portance in the group. In fact un-Weighted index п 
also be looked upon as weighted index numbers wher 
of each commodity is unity, 


and constitutes an important aspect of the construction of index 
numbers. The weights may be ssigned to thi i 


in any manner deemed appropriate to bri 
importance. For example, the Production 
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figures or distribution figures may be taken as weights. The most 
commonly adopted systems of weighting are : 


(i) Quantity weights in which the various commodities are 
attached importance according to the amount of their quantity 
used, purchased or consumed and 


; (ii) The value weights in which the importance to the various 
items is assigned according to the expenditure involved on them. 


The choice of different systems of weighting w.r.t. the quanti- 
lies consumed or the tota] values in the base year or the current 
year or sometimes their arithmetic or geometric crosses gives rise to 
a number of formulae for the construction of index numbers, dis- 
cussed in $ 8/5 and very much depends-on the purpose of the index 
and availability of the data. 


Regarding the system of weighting to be adopted for construc- 
ting index numbers, it is worth while to quote the words of A.L. 
Bowley : 


“The discussion of proper weights to be used has occupied a 
space in statistical literature out of all proportions to its significance, 
for it may be said at once that no great importance need be attached 
to the special choice of weights ; one of the most convenient facts 
of statistical theory is that, given certain conditions, the same result 
is obtained with sufficient closeness whatever logical system of 
weights is applied." 

However, he is not totally against the weighting and suggested 
the arithmetic cross of Laspeyre's and Paasche's formulae discussed 
in next section $ 8.5.2. 

7. Choice of Formula. The choice of the formula to be 
used depends on the availability of the data regarding the prices ч 


the quantities of the selected commodities in the base and/or current 
year. Before discussing various formulae, we give below notations 


and terminology. 


Notations and Terminology 


Base Year. The year selected for comparison, i.e., the yqar 
wart. which comparisons are made. It is denoted by the suffix 


zero '0' 
Current Year. The year for which comparisons are sought|or 
required. Itis denoted by the suffix 1. 


: Price of a commodity in the base year. 


Po: 
pı: Price of a commodity in the current year. 
go: Quantity of a commodity consumed or purchased durjng 


the base year- 
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4, : Quantity of a commodity consumed or purchased in the 
current year. 


Weight assigned to a commodity according to its relative 
importance in the group. 


w 


I; Simple Index Number or Price Relative obtained on ex- 
pressing current year price as a percentage of the base 
year price and is given by 


I=Price Relative= B x 100 (10-1) 
Po: pce Index Number for the current year w.r.t. the base 
аг. 
Pio: Prie Index Number for the base year w.r.t. the current 
Qu dta Index Number for the current year w.r.t. the 
base year. 


Qio : Quantity Index Number for the base year w.r.t. the 
current year. 


V: Value Index for the current year w.r.t. the base year. 
Remark. To be more precise and specific we һауе: 


Po}: Price of the jth commodity in the base year, j=1, 2,... 
n, (say). 


P11 : Price of jth commodity in the current year. 


Similarly дуу and 912 are the quantities of the jth commodity in 
the base year and the current year respectively. f á 


n 
> Poi is total Price of all the n commodities in the base year 
j=1 


n 
and аы is the total quantity of all the commodities consumed in 
Ј=1 


the base year. Similarly 
n n 
ym > > 
E 1 


j= 


is the total value of all the commodities con: i 
п sumed in the base ye: 
However, for the sake of notational Convenience we shall ne: x 


| 


і 


= c. pem 
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п л 
Sou =®р 5 by" =У4,, 
j=l 


j=l 


п п 
Уһ 4017 родо ; YT 
j=l jal 


and so on, the summation being taken over the n selected commodi- 
ties, 


. 10-5. Methods of Constructing Index Numbers. We shal! now 
discuss the various techniques or methods used for the construction 
of index numbers. 


Since price indices are most important of all the indices we 
Shall describe their construction in detail in the following section. 
The quantity indices can be obtained from price indices by inter- 
changing the price (p) and quantity (4) in the final formula. 


10.5.1 Simple (Un-Weighted) Aggregate Method. This is the 
simplest of all the methods of constructing index numbers and con- 
sists in expressing the total price, i.e., aggregate of prices (of all the 
selected commodities) in the current year as a percentage of the 
aggregate of prices in the base year. Thus the price index for the 
current year w.rt the base year is given by : 


Ps x100 (10:2) 


where 


Ep, is the aggregate of prices (of all the selected commodities) 
in the current year and 


Spo is the aggregate of prices in the base year. 


This method, though simple, is not reliable and has the follow- 
ing limitations : 


(i) The prices of various commodities may be quoted in 
different units, e.g., cereals may be quoted in Rs. per quintal or Кр; 
liquids fike milk, petrol, kerosene may be quoted in Rs. per litre ; 
cloth may be quoted in Rs. per metre and soon. Thus the index is 
influenced very much by the units in which commodities are quoted 
and accordingly some of the commodities may get more importance 
because they are quoted in a particular unit. For example, if wheat 
price is quoted in Rs. per kg the index would be entirely different 
than if it is quoted in Rs. per quintal, the latter representation will 
very much emphasise its importance. This index is liable to be 
misused since unscrupulous and selfish persons might manipulate its 
value to suit one’s requirements by changing the units of measure- 
ment of some of the items from 100 gms to kg ; from kg to quintal 
and so on. А 
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(ii) The relative importance of the various commodities is not 
taken into consideration. 


Remark. 


Based on this method, the quantity index is given 
by the formula : 


Quo sx 100 (0.3) 


where q and q, are the quantities of all the selected commodities 
Consumed in the base year and the current year respectively. 


Example 10.1. From the following data calculate Index Number 
by Simple Aggregate Method. 


Commodity Price in 1980 Price in 1981 
(Rs.) (Rs.) 
A 162 171 
B 256 164 
С 257 189 
р 132 145 


(Andhra Pradesh U. В. Com., April 1982) 
Solution. 


COMPUTATION OF PRICE INDEX NUMBER 


Commodity · 


Price (in Rupees) 
score ыа КЕШЕДИ 
1980 (po) 1981 (ру) 
4 162 171 
B 256 164 
с 257 189 
р 132 145 
Total 2p,—-807 Zpı=669 


The price index number using Simple Aggregate Method is 
given by : 


Pu- gi 100-85 x 100=82.90 
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t ‚ Purchased or keted. i 
the weight attached toa СЕШМ, puc nee 


Хир 
Ри= Spp, 100 (10.4) 


By using different systems of weighting we get a number of 
formulae. Some of the Important formulae are given below : 
Laspeyre’s Price Index or Base Year Method. Taking base 


year Quantities as weights, ie., w=gp in (10 4 we get L i 
Price Index given by : Be OO Sos a 


рд 
Ро2"= 00 х ... (10. 
01 Хро, 100 (10.5) 


ТЯ This formula was devised by French Economist Laspeyre in 


Paasche’s Price Index. If we take current year quantities as 


weights in (104) we obtain Paasche’s Price Index which is 
given by: 


Рл? зра x 100 (10.6) 


This formula was given by German Statistician Paasche in 
1874. 


Dorbish-Bowley Price Index. This index is given by the arith- 
metic mean of Laspeyre's and Paasche's price index numbers and 
we have : 


1 27:90 Хр14 ] 

Р108=——| + SS |x 100 „(107 
= 2L Zpoqo pogi ( ) 

This is also sometimes known as L-P formula. 

Fisher’s Price Index. Irving Fisher advocated the geometric 


cross of Laspeyre’s and Paasche’s price index numbers and is 
given by: 


Por =L Pot" x Poy? 2 


Epi | Урф x 
=| = y PIA 100 „(10.8 
родо 5 Zpods поз) 


Fisher’s index is termed as an Ideal index since it satisfies 
time reversal and factor reversal tests for the Consistency of index 
numbers. (For details see § 106) 
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Marshall-Edgeworth Price Index. Taking the arithmetic 
cross of the quantities in the base year and the current year as 
weights ie. w=(go+g;)/2, we obtain the Marshall-Edgeworth 
(M.E.) formula given by 


®р\(4»+ 4@,)/2. Zpio- 31) 
MEL ET = -X100  — ...(10.9) 
Pu Ipla tD 007 рада) 


=| 2290+ Урду } 100 --(10.9a) 
pogot роді 


Walsch Price Index. Instead of taking the arithmetic mean 
of base year and current year quantities as weights, if we take their 
geometric mean, i.e., w= V yogi, then we obtain Walsch Index 
given by the formula : 


Уру Vg (10.10) 
Wan. 1 ri X 100 >, 

Ди Epo Уа. 

Kelly's Price Index or Fixed Base Index. This formula, named 
after Truman L. Kelly, requires the weights to be fixed for all 
periods and is also sometimes known as aggregative index with fixed 
weights and is given by the formula : 


P, x. En x100 (10.11) 
“С рад 


where the weights are the quantities (4) which may refer to 
some period (not necessarily the base year or the current year)-and 
are kept constant for all Periods. The average (A.M. or G.M.) of 
the quantities consumed of two, three or more years may be used 
as weights. 


Kelly’s fixed base index has a distinct advantage over Las- 
peyre’s index because unlike Laspeyre’s index the change in the 
base year does not necessitate a corresponding change in the 
weights which can be kept constant until new data become avail- 
able to revise the index. As such, currently this index is finding 
great favour and becoming quite popular. The Labour Bureau 
wholesale price index in U.S.A. is based on this method. 


Remarks 1. In all the above formulae, the summation is 


taken over the various commodities selected for the construction of 
the index number. 


2. Laspeyre's Index vs, Paasche's Index. Laspeyre's price 
index is based on the assumption that the quantities consumed 
in the base year and the current year are same, an assumption 
which is not true in general. If the consumption of some of the 
commodities or items decreases in the current year due to rise in 
their prices or due to changes in the habits, tastes and customs of 
the people, then Laspeyre’s index which is based on base year 
quantities as weights gives relatively more weightage for such com- 
modities (whose prices rise sharply) and consequently the numera- 
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tor in (10.5) is relatively larger. Hence Laspeyre's index is expected 
have an ‘upward bias’ as it over-estimates the true value. Similarly, 
if the consumption of certain commodities increases in the current 
year due to decrease in their prices (or changes in the tastes, habits 
and customs of the people), then Paasche’s index which uses current 
year quantities as weights gives more weightage to such commodi- 
ties (whose prices decline much). Accordingly, Paasche’s index has 
a ‘downward bias’ and is expected to under-estimate the true value. 
However, it should not be inferred that Laspeyre’s index must be 
larger than Paasche’s index always. In this context it may be worth 
while to quote the following words of Karmal. 


“If the prices of all the goods change in the same ratio then 
Laspeyre’s and Paasche’s price index numbers will be equal, for then 
the weighting system is irrelevant ; or if the quantities of all the goods 
change in the same ratio, they will be equal for then the-two weight- 
ing systems are the same relatively.” 


In general, the true value of the price index lies somewhere 
between the two. 


Since the weights change for every year, Paasche’s price index 
numbers require much more computational work as compared with 
Laspeyre’s price index numbers. 


3. Marshall-Edgeworth and Fisher’s Index Numbers. These 
formulae are a sort of compromise between Laspeyre’s price index 
(which has an upward bias) and Paasche’s price index (which has 
a downward bias) and have no bias in any known direction. They 
provide a better estimate of the true price index. However, since 
both these formulae require the base year and current year prices 
and quantities for their computation they have practical limitations 
because it is very difficult and rather expensive also to obtain 
correct information regarding these weights. Further, these for- 
mulae require much more computational work than Laspeyre's or 
Paasche's price index numbers. Moreover, although Fisher’s index 
is termed as ideal index since it satisfies Time Reversal and Factor 
Reversal tests for the consistency of index numbers (discussed later), 
it is rarely used in practice because of its computational difficufties 
and statisticians prefer to rely on simple, though less exact, 
Laspeyre's and Paasche's index numbers. It may be remarked that 
both Fisher's Ideal index and Marshall- Edgeworth index lie between 
Laspeyre's and Paasche's indices. 


4. Quantity Indices. As already discussed, quantity index 
numbers reflect the relative changes in the quantity or volume of 
goods produced. consumed, marketed or distributed in any given 
year w.r.t. to some base year. The formulae for quantity indices 
are obtained from the formulae (10.4) to (10.11) on interchanging 
prices (p) and quantities (g). Thus, for example, 
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24уро ®ро@\ 
Z= x 100= =" x 100 (10.12) 
9 Уфро Zpogo 
Удур, Ур\д\ 
paz x100— x 100 (10.13) 
Qu У4ору 27140 ( 


Qu*—[Qu?x Ора] 


= | Хро x УР х 100 (10-14) 
родо 22190 


2g, (Po--p1) 
ME. (10.15 
Qor Eu (Po p) se! ( ) 
Pot Хр; 
= х 100 «+-(10.15a) 
Eqypod- Eqop, 


and so on. 


5. Value Indices, Value index numbers are obtained on 
expressing the total value (or expenditure) in any given year as a 
Percentage of the same in the base year. Symbolically, we write 
= Total value in current year x100 


ЖЛ Total value'in base year 


> a= y 109 (10.16) 
0 


We shall now discuss Some numerical illustrations based on 
the above formulae. 


Example 102. From the data given below, construct Laspeyre's 
and Paasche's price index numbers with base 1975 : 


1975 1979 


Commodity ax EOM LM e pal E OMNIUM 
Price ^ Quantity ^ Price Quantity 
4 4 2 6 3 
B 3 nj 2 1 
C 8 2 4 6 
[Delhi U. B. Com. (Hons.) 1980] 
Solution. 


CALCULATIONS FOR LASPEYRE’S AND PAASCHE’S 
INDEX NUMBERS 


c 1975 1979 
ommo- 
dity Price Quant ity Price Quantity се) Ра ра Fide 
(Po) (4) G1) (a) 

B uw o — 
B 3 5 2 1 15 3 1 2 
c 8 4 6 16 48 8 24 

Total 


RO ee RR rm em s etym 


put wn tnter 
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Laspeyre's and Paasche's Price Indices are given by : 


Pylt= Erst y 199.39 x 190=0.7692 x 100=76.92 


ZPodo 39 
®р\@. 44 
Pa Pis EX MU =0. =69. 
Po, Po edi х 100 63 X 100=0.6984 x 100—69.84 


Example 10.3. From the following data calculate price index 
numbers for 1980 with 1970 as base by (i) Laspeyre's method, 
(i) Paasche's method, (iii) Marshall-Edgeworth method, and 
(iv) Fisher’s ideal method : 


Commodity 1970 1980 
Price Quantity Price Quantity 
A 20 8 40 6 
B 50 10 60 5 
C 40 15 50 15 
D 20 20 20 25 
[Delhi U. B.Com. (Hons.) 4I 1985] 
Solution. 
CALCULATIONS FOR PRICE INDICES BY 
DIFFERENT FORMULAE 
Commodities 1970 1980 
Po Pı qı Podo Pods Pide pity 
A 20 8 40 6 160 120 320 240 
B 50 10 60 5 500 250 600 300 
c 40 15 50 15 600 600 750 750 
D 20 20 20 25 400 500 400 500 


Ураа рә: Уф EP 
—1660 =1470 —2070 =1790 


(i) Laspeyre's Price Index 


Putt = i x100— 2070. x 100= 1.24699% 100=124.699 
euo 


(ii) Paasche's Price Index 


«End S0 oss E: 
Р = zn x 100= 1470 x 100=1.2177 x 100— 121.77 
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(iii) Marshall-Edgeworth Price Index 
Ypidyt-Zpg ) ( 2070+1790 ) 
МЕ zy 0 
$3 ( Epot Epoq, ) =| 51660-14707 )X 10 


3860 э 
—3nD) * 100= 1.2332 x 100—123.32 


(iv) Fisher’s Price Index 
90 
Г -Zne y Хра vio- [2070 17 Eo 
Pu — Зра Louth 1660 1470 
= ү 1.24699x1.2177 х 100= 1.51846 х 100 


Aliter: = 1:23226 x 100= 123-23 


PoP — у PauU x Pa = 124.699 x 121.77 = v 15184.597 
=123.23 


Example 10.4. From the data given below construct index 
number of the group of four commodities by using Fisher's Ideal 


Formula : Base year Current year 
Commodities Price per Expenditure Price per Expenditure 
unit (Rs.) unit (Rs.) 
A 2 40 b] 75 
B 4 16 8 40 
c 1 10 2 24 
D 5 25 10 60 


[Delhi Uni. B.Com. (Hons.) II, 1984 р 
Bombay Uni. B.Com., April 1976] 
Solution. In this problem we are given the expenditure (е) 
and the prices (р) per unit for different commodities: We have 


Expenditure=Price x Quan tity 
Expenditure 

Price 
> q= AAS (* 
y PO! 
Using (*) we shall first Obtain the quantity Consumed for the base 
year and the current year as given in the following table. 


COMPUTATION OF FISHER’S IDEAL INDEX NUMBER 


> Quantity= 


Commodities Po Polo qo Р ра qi Dido роф 


A 2 40 20185 1/75/5193 00 30 
B 4 16 Ti S ЗАД). 5 32 20 
С 1 10 10$ 51 12 20 12 
D 5 0 


Zpoq,—91 Zpıqı=199 Zpi4,—202 Ур„а,=92 
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Hence Fisher’s Ideal price index is given by 
[na | Xna _ 
Paf = | = 1 x] 
5 Zpodo аа Zpodi i 
202 x 199 — / 40198 
4 91х92 «190 =, — 8392 X100 
=v 4.8015 «100 —2.1912x 100 


=219.12 


Example 10°5. It is stated that the Marshall Edgeworth index 
number is a good approximation to the ideal index number ; verify 
using the following data : 


1970 1974 
Commodity Ргісе Quantity Price Quantity 
A 2 74 3 82 
В 5 125 4 140 


C 7 40 6 33 
U.C. W.A. (Final) June 1984 ; Bombay Uni. B.Com. April 1975] 


Solution. 
COMPUTATION OF MARSHALL EDGEWORTH AND 
FISHER'S IDEAL INDEX NUMBERS 


Commodity Po 4 D а Poo Poth Pide Didi 
a —————————————_— 

A 2 LAE] 82 148 164 222 246 

B 5 125 4 140 625 700 500 560 

с 7 40 6 33 280 231 240 198 


ЫЛД Epoa, Zpd, In 
=1653 1095 =962 —1004 


Marshall-Edgeworth price index number is given by : 


РМ. = Zloto) у otg 100= deo Pets 100 


Zpo(Go+9s) 
962-+1004 уу i Ж 
51055-11095 ООЛ grag 10079527 


Fisher's price index number is given by : 
| Epsdo X Ep 
F= | ion x 100 
fe pogo X pofi 
962x 1004 965848 
= 1053x1095 X 00-4 1153955 109 


= 0.837657 X 100 =.91523 X 100— 91.523 
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Since Po”? and РР are approximately equal, Marshali- 
Edgeworth index number is a good approximation to Fisher’s ideal 
index number. 


Example 10.6. Compute by Fisher’s formula the Quantity 
Index Number from the data given below ; 


1974 1976 
Articles Price (Rs.) Total Value (Rs.) Price (Rs.) Total Value (Rs.) 
A ni 50 4 48 
B 8 48 7 49 
С, б 18 5 20 


[Himachal Pradesh Uni. M.Com., July 1979 ; 
Delhi. Uni. B.Com. (Hons.) 1977] 


Solution. Here we are given the total values (у) for the 
current and base years which are given by: 


Total value=Price x Quantity 
=> v=pXq 
> q=»/p (t) 
Hence the quantity consumed for base and current years is obtai- 
ned on dividing the total value by the Corresponding price. 


COMPUTATION OF FISHER'S QUANTITY 
INDEX NUMBER 


Article 


Po WEP Go p = dd pg Dido 


A 5 50 0 4 48 о 64 40 

B 8 48 $m 49 7 56 42 

с 6 18 3:0 70$ 20 4 24 15 
-116 -117 =140 97 


2 2 Fisher's quantity index number for 1976 with base year 1974 
18 given by the formula 


| apXZap / Уро X, 

ғ 1 x 100 "Doi X Epig, 

Qo Zlo po X дору © Уро X Ўраду nue: 
140x 117 16380 

‚ CX l6x97 Х100=,/ x 100 


=V 1.4557 х100=1.2065ҳ 100=120.65 
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Example 107. Given the data : 


Commodities 

A B 
Po 1 1 
do 10 ő 
Pı 2 x 
ГА 5 2 


where p and q respectively stand for price and quantity and subscripts 
stand for time period, find z, if the ratio between Laspeyre’s (L) and 


Poasche’s (P) index number is 
LP 28 327, 
(Delhi Uni. M.A. (Econ.) 1973, 1970] 
Solution. 


COMPUTATION OF ES AND PAASCHE'S 


INDICE 
EEE ав 


Commodities Dp do pi qı Dodo Poh Pigo Рі 


————————MMMMÓ MÀ 
5 20 10 


1 5 x 2 5 2 5х 2х 


a 


Хра, Хр Хра Урі 
=15  -7 =20+5х =10+2х 


——————————À—————————— 


| 


We are given 
Pubs 1128; 2 
TP 772 (t) 
Now 
a. Хрл -( 204-5x -( че) 
Pattee x100 is )x100 = x10 
РУХСА ( 10+2х ) 
Баана =(———— |х 100 
Ро БЕРТ х100 7 
Substituting in (*) we get : 
4+х 
3 287 m T(4tz) _ 28 
( 104-2z ) 27 3010+2х) 27 
7 
4tx 4 UT 
Бата = 904+) =4010+-22) 


ES 36+9х=404+85 > х=4 
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Example 10.8. Calculate the weighted price index from the 
following data : 


Materials Unit Quantity Price during 
required required 

1963 1973 

Rs. Rs, 
Cement 100 Ib. 500 Ib. 5.0 8.0 
Timber c. ft 2,000 c. ft 9.5 14.2 
Steel sheets cwt. 5 cwt. 34.0 42.2 
Bricks per '000 20,000 12.0 24.0 


[Delhi Uni. B.A., Econ. (Hons.) 1973] 


Solution. Since the quantities (weights) required of different 
materials are fixed for both the base and current years, we will use 
Kelly's formula for finding out price index. 


Further, for cement unit is 100 Ibs. and the quantity required 
is 500 Ibs. Hence the quantity consumed per unit for cement is 


500/100=5. Similarly, the quantity consumed per unit for bricks 
is 20,000/1,000=20. 


COMPUTATION OF KELLY’S INDEX NUMBER 


Materials Unit Quantity Price during 
required required q 1963 1967 

Po Pı 4Po Ярі 
Cement 100 Ib 500 Ib 5 50 8-0 25 40 
Timber c.ft. 2,000c.ft. 2000 975 142 19,000 28,400 
Steel Sheets cwt. 50 cwt. 50 340 422 1,700 2,100 
Bricks per'000 20,000 20 120 240 240 480 

Хдр, Хдр, 


=20,965 =31,020 


Kelly’s Price Index is given by : 


Zap. 31020 
Puč = = = 
=F в X100 20965 X1002147.96 


10.5.3. Simple Average of Price Relatives. In this method, 
first of all же obtain the price relatives for each commodity. The 
Price relatives are obtained by expressing the Price of the commo- 
y 3 ue current year as a percentage of its price in the base 


P=Price Relative for a commodity = 22 x 100 --.(10.17) 


1 
k 


‘Index Numbers 541 


Ргісе relatives are the simplest form of the index numbers for 
each commodity. The price index for the composite group is ob- 
tained on averaging these Price relatives by using some suitable 
measure of central tendency, usually arithmetic mean (A.M.) or 
geometric mean (G.M.). Price index using simple arithmetic mean 
of the relatives is given by: 


р,(4.м)=-1- {2 х 100)=+ zP ...(10.18) 
0 


where n is the number of commodities in the group. 


Е Using simple geometric mean of the price relatives, the price 
index is given by : 


Ри («T Bx 100) J" 
n [I J" (10,19) 


where I denotes the product of the price relatives for the n com- 


modities. To evaluate (10°19) we use logarithms. Taking logarithm 
of both sides in (10°19) we get 


log Pa (6.M)- Lx log( 2+ x 100) 
0° 
=12 log P 


=> Po (G.M)=Antilog [zz log P| --.(10.19a) 


Merits and Demerits. The index number based on the simple 
average of the price relatives overcomes some of the drawbacks of 
the ‘simple aggregate method’, viz., 


(i) Price relatives are pure numbers independent of the units 
of measurement and hence the index number based on their average 
is not affected by the units in which the prices are quoted. 


(ii) The extreme observations (large and small price quota- 
tions) do not influence the index unduly. It gives equal importance 
to all observations. 


The drawback of this method is that it gives equal weights to 
all the commodities and thus neglects their relative importance in 
the group. This drawback is removed by taking the weighted aver- 
age of the price relatives as discussed in $ 10'5'4 page 681. 


Another limitation of this method is the choice of the average 
to be used. As already discussed, G.M., though difficult to com- 
oute, is theoretically a better average than A.M. However, because 
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of the computational ease, A.M. is used in practice. Some econo- 
mists, notably F.Y. Edgeworth advocated the use of harmonic 
mean for averaging the price relatives but it did not find favour with 
others and is seldom used. 


. Remark. The distribution of the price relatives is found to be 
positively skewed and the skewness increases as the base is shifted 
more and more away from the given year. 


Example 10.9. Construct Index Number for eacn year from 
the following average annual wholesale prices of cotton with 1944 as 


base : 
Year 


1944 
1945 
1946 
1947 
1948 


Solution. 


Wholesale prices 
(Rs.) 


Year 


1949 
1950 
1951 
1952 
1953 


Wholesale prices. 
(Rs.) 


(Punjab Uni. B. Com., April 1977) 
The index numbers for each year are obtained by 


expressing the prices in the current year as a percentage of the price 
in the base year 1944. 


COMPUTATION OF PRICE INDEX NUMBERS 


Year 


Wholesale Prices 


Index Number 
(Base : 1944=100) 


80 


100 
50 
75 * 100— 66:67 
65 
75 * 100 =86 67 


60 
75 * 100—80-00 


72 
75 х 100—96:00 


70 
75 х 100—93:33 


= х 100=92'00 


75 Ў 
Ei x 100:=100:00 


84 7 
75 х 100—112:00 


80 Е 
Ei x 100=106°67 


—————————————————————E 


Index Numbers 543 


Example 10.10. The following are the prices of commodities 
in 1970 and 1975. Calculate a price index based on price relatives 
using the arithmetic mean as well as geometric mean. 


Commodity 
[^] D 


(Bombay Uni. B. Com. Oct. 1975) 


Solution. 
COMPUTATION OF PRICE INDEX BASED ON 
A.M. AND G.M. 
Commodity Price Price Relative log P 
In 1970 In 1971 P-(pi|po) x 100 
(Po) (Pı) 
A 45 55 122:22 2:0871 
B 60 70 116:67 2:0667 
c 20 30 150-00 2:1761 
D 50 75 150°00 21761 
E 85 90 105-88 2:0246 
F 120 130 108:33 2:0347 


ZP—7531 — ZXogP 
—12:5653 


Index Number based on Arithmetic Mean is : 
Pa (AM)=Ls (4 )x 100—-Lzp 
n Po n 


= 521.15.517 


Index Number based on Geometric Mean is given Бу : 
jos Py. (G.M)-— S: log Pat х 12.5653 
=2.0942 
> Po, (G.M.)=Antilog (2.0942) —124.3 


a 
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Example 1011. Calculate Price Index for 1975 and 1976 using 
1970 as base year from the following data : 


Commodity Prices (Rs. per unit) 
1970 1975 1976 
A 5 6 4 
B 7 10 7 
C 8 12 6 
D 20 a7 16 
Е 500 550 540 


[Delhi U. В. Сот. (External), 1982] 
Solution. 


CALCULATIONS FOR PRICE INDICES 


Commodity Price Relatives 


For 1976 
(Palo) x 100 


5 


Price index for 1975=-—2 (2 x100 


) = 86 


=121°57 i 
Price index for 1976= Lx (5x19) - 5590 
=88.6 


10.5.4. Weighted Average of Price Relatives, The short- 
coming of Simple Average of Relatives Method which assumes that 
all the relatives are equally important is overcome in this method 
which consists in assigning appropriate weights to the relatives 


taking the weighted average, usually A.M. or G.M. of the price 
relatives. Thus, based on weighted A.M. the price index is given 


by: 


xy (2 x 100) 
Р,„(А.М.)= у ...(10.20) 
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where W is the weight attached to the price relative P. 


Steps for Computing Po, (A.M.) in (10.20) 
1. Find the price relatives for each commodity, viz., 
P= x100 
Po 
2. Multiply the price relatives in step 1 by the corresponding 
weights (W) assigned to get the product WP. 


3. Obtain the sum of products obtained in step 2 for all the 
commodities to get EWP. 


4. Divide the sum in step 3 by EW, the total of the weights 
assigned. 


The resulting figure gives the price index based on the weight- 
ed aver age of price relatives. 


The price index based on the weighted geometric mean of 
price relatives is given by Uy 
Po, (weighted вмә-| TI x 100 el [TW . (951) 
0 
Taking logarithm of both sides we get 


log [ Py (weighted G.M.) J^ [zv log (2 x100) ] 
= sy EW log Р] 


У 
> Po, (weighted G.M.)=Antilog [a 2410.22) 


For computational purposes, formula (10.22) is used and 
requires the following steps. 

Steps 1. Compute the price relatives P= x 100, for each 
commodity. A 

2. Find the logarithms of all the price relatives. This gives 
us log P values. 

3. Multiply log P values for each commodity by the corres- 
ponding weights (W) assigned. This will give W. log P values. 

4. Find the sum of the values W log P in step 3 over all the 
commodities to get ZW log P. 

5. Divide the sum obtained in step 4 by EW, the total of 
weights. 
T 6. Antilog of the value obtained in step 5 gives required price 
index. 
Remarks 1. Since price relatives are the simplest form of 
the index numbers, we may also use the notation / for P, i.e., we 
may write 
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Example 10:14, The group indices of wholesale prices in India 
for the second week of September 1958 and the group weights are 
given below. Compute the index number of wholesale prices in the 


week. Group Weight тае 

Food Articles 31 473.6 

Manufacturers 30 390.2 

Industrial Raw Materials 18 510.2 

Semi Manufacturers 17 403.3 

Miscellaneous 4 624.4 
Solution. 


COMPUTATION OF WHOLESALE PRICE INDEX 


S. No. Group Weight (W) Index (1) Ww 
f Food article 31 4736 14681-6 
2: Manufacturer 30 390-2 11706:0 
3; Industrial raw materials 18 5102 9183:6 
4. Semi Manufactures 17 403:3 685611 
5. Miscellaneous 4 624-4 2497-6 
2W=100 ZIW—44924:9 


Required index number of wholesale prices in the second 
week of September 1958 is given by 


Pa (AM) = AO —449.249 


EXERCISE 10.1 


ds What isan index number ? Explain the various problems in- 
volved in the construction of index numbers ? 
(Nagarjuna Uni. B. Com., April 1981 ; Guru Nanak Dev. U. B. Com., 1981) , 


2. Discuss briefly the problems faced in the construction of an index 
number of prices. Delhi Uni. B.A. (Econ. Hons. I) 1986 ; 
Himachal Pradesh Uni. M.A. (Econ.) July 1983] 


3. Examine the usefulness of price index numbers. Discuss the pro- 
blems faced in the construction of an index number. 
[Delhi Uni. B. Com. (Hons.) 1I, 1984 ; 
Guru Nanak Dev Uni. B. Com II, April 1983] 
4, “Ап index number is a special type of average", Discuss. 
[Delhi Uni. B. Com. (Hons.) 1980] 
5. “In the construction of index numbers the advantages of geometric 
mean are greater than those of arithmetic mean". Discuss. 
Ў (Karnataka Uni. В. Com., April 1982) 


' Index Numbers 549 


6. What are index numbers? Why are they called economic baro- 
meters ?° [Delhi Uni. B. Com. (Hons.) 19821 


7. What is the importance of weighting in the construction of index 
numbers ? Which of the three —mean, median and geometric 
mean—will you prefer in calculating index numbers and why ? 


8. _ What are Index Numbers ? How are they constructed ? Explain 
the role of weights in the construction of general Price Index Numbers. 


[Delhi Uni. B. Com. (External) 1982] 


9. Distinguish between the methods of assigning weights in Laspeyre's 
and Paasche’s price index numbers. Show that Laspeyre’s price index is greater 
than Paasche's price index in case of rising prices. 

[Delhi Uni. В.А. (Econ. Hons.) 1982] 

10. *Laspeyre's index tends to be greater than Paasche's index". Com- 
ment. [Delhi Uni. B.Com. (Hons.) И, 1985] 


11. (a) Distinguish briefly between the Laspeyre’s and Paasche's index 
numbers. [Delhi Uni. B.A. (Econ. Hons. 1), 1983] 


(b) Write a short note on Fisher's ideal index number. 
{Delhi Uni. B.A. (Econ. Hons. II), 1983] 
12. What is the difference between Laspeyre's and Paasche's systems of 


weights in compiling a price index ? Calculate both Laspeyre’s and Paasche's 
aggregative price indices for the year 1960 from the following data ; 


Commodities Quantities Price Per Unit 
1959 1960 1959 1960 
A 3 5 2:0 2:5 
B 4 6 25 3-0 
[^ 2 3 3:0 2:5 
D 1 2 ro 0:75 


Ans. 109.78 ; 109.72 {(Dethi Uni. B.A. (Econ. Hons.) 1981, 1978] 


13, From the data given below compute Laspeyre's and Passche's index 
numbers. 


Commodity Price Quantity 
1935 1945 1935 1945 
A 4 10 50 40 
B 3 9 10 2 
с 2 4 5 2 


(Price and Quantity figures are in appropriate units). 
Ans. 25416 : 250°58 (Guru Nanak Dev Uni. В. Com. II, Sept. 1983) 


14. (a) Using Paasche's formula, compute the quantity index and the. 
price index numbers for 1970 with 1966 as base year : 


Commodity Quantity Units Value in (Rs.) 
1966 1970 1966 1970 

A 100 150 500 900 

B 80 100 320 500 

[o] 60 72 150 360 

D 30 33 360 291 


U.C.W.A. Final (O.S.) June 1975] 


550 ness. Statistics 
s (b) For the above problem also compute price inde 
(i) Marshall-Edgeworth formula (ii) Fisher's formu; 
(iii) Dorbish-Bowley formula (iv) Walsch formula. 
Ans. (a) P5*4—119.2; — Q,Pa—131.09 
(5b) (i) 118.68, (ii) 118.62, (iii) 118.6225, (iv) 118.64 


15. ‘Marshall-Edgeworth index number is a good approximation to the 
Fisher's Ideal Index Number."—Verify the truth of this Statement from the 
following data : 


Price Quant ity 
100 


Jowar 


Price Quantity Price Quantity 


3 
5 


Ans. PaME=49.135 ; РЁ=—=49.134 U.C.W.A. (Final), Dec. 1980] 


16. From the data given below construct an index number of the group 
of four commodities by using (i) Simple Aggregate Method and (ii) Fisher's 
Ideal Formula. 


Commodities 


Expenditure| Price per Expenditure 
(Rs.) unit (Rs.) 


[Delhi Uni. B.Com. (Hons.) П, 1984 ; Delhi U. B. Com. (External) 1980, 
Ans. (i) 208.33 р (Hi) 219.13 Bombay Uni. B. Com. 1976] 


17. Using Fisher's Ideal Formula, com i ity i 
В pute price and quantity index 
numbers for 1984 with 1982 as base year, given the following ASEA 8 


Year Commodity A Commodity B Commodity C 
Price Quantity Price Quantity Price Quantity 
(Rs.) (kg) (Rs.) (kg) (Rs.) (kg) 
1982 5 10 6 6 3 
1984 4 12 7 7 5 4 


[C-4. (Intermediate) Nov. 1985] 
Ans. Pj4TF-—8359, Q&F—-120:6 


ТРЕМА 
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18. Compute by Fisher’s formula the Quantity Index Number from the 
data given below : 


1974 1976 
Article Price Total Value Price Total Value 
(Rs.) (Rs.) (Rs.) (Rs.) 
A 5 50 4 48 
B 8 48 a 49 
с 6 18 5 20 


[Delhi Uni. B. Com, 1977] 
Ans. 0,12=120°8 


19. What are Index Numbers ? Why are they called economic baro- 
meters ? 


On the basis of the following information, calculate the Fisher's Ideal 
Index Number : 


_————-—.———————..— 


Base year Current year 

Commodities = ———— 
Price Quantity Price Quantity 

A 2 40 6 50 

B 4 50 8 40 

с 6 20 9 30 

р 8 10 6 20 

Е 10 10 5 20 


CR eue A е з 
(Delhi Uni. В. Com. Ш, 1985 ; Del hi Uni. B. Сот, (Hons.) 1982] 
Ans. Pa. 2149.15 


20. (a) What is an Index Number ? Explain the terms—Price Relative ; 
Quantity Relative ; and Value Relative—with reference to a single 
commodity. 


(b) What do you understand by price relatives and discuss the 
method of constructing index numbers based on them. 


21. Ina given community, the articles bought and sold for consumption 
purposes are bread, milk and beef. Their respective prices for the two periods 
are the following :— 


Period I Period IL 
Bread (loaf) Shilling 0:15 Shilling 0:20 
Milk (pint) 55002010, 572045 
Beef (pound) aS 0:30 д 0:25 
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Construct an Index Number from the above. Assigning hypothetical 
weights of 2,000 ; 5,000 and 1,000 units to bread, milk and beef, also construct a 
weighted index number. (Punjab Uni. B. Com., Sept. 1979) 


Ans. Py=122:22, Py (weighted) =137°49 


22. From the informations given below, prepare Index Numbers of 
Price for three years with average price as base : 


(Rate per Rs.) 
(Year) (Wheat) (Rice) (Sugar) 
Ist year 2kgm ikgm "400 kgm 
2nd year 16 kgm *800 kgm *400 kgm 
3rd year 1 kgm "750 kgm :250 kgm 


: (Kurukshetra Uni. B. Com. II, April 1982) 
Ans. Pa=79:32, P4—9221, Р„—=128°86 


23. Calculate the index number by using geometric mean, 


Commodity Base Year Price Current Year Price 
A 2 7 
B 4 5 


(Bombay Uni. B. Com., April 1981) 
Ans. 209-17 


24. The following are the prices of commodities in 1978 and 1979. 
Calculate a price index based on price relatives, using the geometrtc mean. 


Year Commodity 

A B с р Е Е. 
1978 45 60 20 50 85 120 
1979 60 70 30 75 90 130 


[Bombay Uni. B.Com., 1975, 1978] 
Ans. 126. 


25. The pri 1 
í price quotations of four different commodities f. 

are given below, Calculate the ind with 1970 os tose te 
using (i) simple average of price feas p Меш ПА 1970 as base ix 


Ex tel) weighted average of price 
Commodity Weight Price in rupees 
1975 1970 
A 5 450 2°00 
B 7 3:20 2:50 
c 6 4:50 5:00 
D 2 1:80 100 


[Delhi Uni. B.Com. 1980 ; Bombay Uni. B. 
Ans. (i) 170.75, (ii) 164.05 Ж чон ы 
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1 26. Calculate price index of the following data by taking 1975—100 by 
weighted average cf relatives method : 


1975 1976 
Commodities Price (Rs.) Quantity Price (Rs.) 
A 20 2 25 
B 10 3 12 
c 12 : 18 
| р 16 4 16 
Е 5 7 4 


(Delhi Uni. B.Com., 1981) 
Ans. 11048 


27, Given below are the prices and weights of given commodities for the 
years 1980, 1981 and 1982 : 


Commodity Weight Prices in rupees 
I 1980 1981 1982 
A 20 12-00 18:00 24:00 
B 15 3:00 6:00 15:00 
С. 10 12:50 18:75 25:00 
^ D 40 10:00 30:00 50:00 
E 15 450 9-00 13:50 


Using either aggregative method or relative method, calculate the 
weighted price index numbers for 1981 and 1982 taking 1980 as the base year. 
(Himachal Pradesh Uni. M. Com., July, 1983) 


Ans. Price indices based on Price Relatives аге: 
For 1981 : 225 ; For 1982 : 380. 


10.6. Tests of Consistency of Index Number Formulae. 
In the last section § 10-5 we have discussed various formulae for 
the construction of index numbers. None of the formulae measures 
the price changes or quantity changes with exactitude or perfection 
and has some bias. The problem is to choose the most appropriate 
formula in a given situation. Asa measure of the formula error a 
number of mathematical tests, known as the tests of consistency of 
index number formulae have been suggested. In this section we 
Shall discuss these tests, which are also sometimes termed as the 
criteria for a good index number. 
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1. Unit Test. This test requires that the index number 
formula should be independent of the units in which the prices or 
quantities of various commodities are quoted. All the formulae 
discussed in $ 10.5 except the index number based on Simple Aggre- 
gate of Prices (Quantities) satisfy this test. 


2. Time Reversal Test. The time reversal test, proposed 
by Prof. Irving Fisher requires the index number formula to possess 
time consistency by working both forward and backward w.r.t. 
time. In his (Fisher's) words : 


"The formula for calculating an index number should be such 
that it gives the same ratio between one point of comparison and the 
other, no matter which of the two is taken as the base or putting it 
another way, the index number reckoned forward should be reciprocal 
of the one reckoned backward." 


In other words, if the index numbers are computed for the 
same data relating to two periods by the same formula but with the 
bases reversed, then the two index numbers so obtained should be 
the reciprocals of each other. Mathematically, we should have 
(omitting the factor 100), 


Pu X Pyy=1 (10.17) 
or more generally 


Pob X Poo=1 --. (10.18) 
where Pais the price index (without factor 100) for year ‘b’ with 
year ‘a’ as base and P», is the price index (without factor 100) for 
year ‘a’ with year ‘b’ as base. 


Time reversal test is satisfied by the following index number 
formulae : - 


(i) Marshall-Edgeworth formula [c.f. Example 10.17 (iii)] 


(ii) Fisher’s ideal formula [с./. Example 10-17 (v)) 
(iii) Walsch formula [c.f. Example 10.17 (iv)] 
(ir) Kelly’s fixed weight formula [Proved on page555] 
(v) Simple aggregate index ` [c.f. Example 10.17 (i)) 


(vi) Simple Geometric Mean of Price Relatives formula (See 
Page 555.) 


(vii) Weighted Geometric Mean of Price Relatives formula 
with fixed weights. 


Laspeyre’s and Paasche’s index numbers ao not satisfy this 
test [с./. Example 10.17 (i?) and 10.17 (vi)]. 


"чү түр 


сыы Ам 
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Let us verify this test for Kelly’s fixed weight formula. We 
have (without factor 100) 


Hence Kelly's fixed weight formula satisfies time reversal 


test. 
For the index number based on simple G.M. of price relatives 


; оюу" 
PQ(G.M)- (0-] 


we have: 


Hence the simple geometric mean of price relatives formula 
satisfies time reversal test. Similarly the test can be verified for the 
weighted geometric mean of price relatives index with fixed 
weights, 

Remark, Pio can be obtained from the formula for Po by 
interchanging the subscripts 0 and 1, i.e., replacing 0 by 1 and 1 
by 0 

3. Factor Reversal Test. This is the second of the two im- 
portant tests of consistency proposed by Prof. Irving Fisher. Accor- 
ding to him : 

“Just as our formula should permit the interchange of two times 
without giving inconsistent results, so it ought to permit interchanging 
the prices and quantities without giving inconsistent results—i.e., the 
two results multiplied together should give the true value ratio, except 
for a constant of proportionality.” 


This implies that if the price and quantity indices are obtained 
for the same data, same base and current periods and using the 
same formula, then their product (without the factor 100) should 
give the true value ratio, since price multiplied by quantity 
gives total value. Symbolically, we should have (without factor 


100), 
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®р\й\ 
PanXQu- 
01 Qo Уродо 
where Zp,q, and Zpoq, denote the total value in the current and 
base year respectively. 


= (10.19) 


Fisher’s formula satisfies the factor reversal test. [c. f. Exam- 
ple 10.17 (у)]. In fact Fisher's index is the only index satisfying this 
testas none of the formulae discussed in § 10-5 satisfies this test. 
Proofs for some of them, viz., Laspeyre's, Paasche's, Marshall- 
Edgeworth, Simple Aggregate and Walsch index numbers do not 
satisfy the factor reversal test are given in Example 10.17. 


Remark. Since Fisher's index is the only index which satisfies 
both the time reversal and factor reversal tests, it is sometimes 
termed as Fisher's /deal Index. 


4. Circular Test. Circular test, first suggested by Wester- 
gaard, is an extension of time reversal test for more than two 


Pav X Pycx Рса== 1, abc (10.20) 


where Pi; is the price index (without factor 100) for period ‘j’ with 


period ‘i? as base. In the usual notations (10.20) can be stated 
as: 


Pa X PX Рьу==1 ++-(10.21) 
For instance 


Урд, |, EPsdi | род 
Pol? X Piat" X Ре = Pio y EDS Уроды „| 
a zd R1 Zpodo x Ур % Ipaq: T1 
Hence Laspeyre's index does not satisfy the circular test, 
Similarly, it can be verified that none of Pasche's, M.E,’s, Walsch's, 
and Fisher's indices satisfies this test. In fact, circular test is not 
satisfied by any of the Weighted aggregative formulae with changing 
weights, i.e., if the weights used in the construction of index num- 
bers Po, Pia and Poo change. This test is satisfied only by the index 
number formulae based on : 


(i) Simple geometric mean of the price relatives, 
and 


(ii) Kelly's fixed base method. 


For example, for the index numbers based on simple geome- 
tric mean of price relatives we have : 


rene ICQ COPY] 
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| (Goold | ДЕЛ” 
Po Pı P» 

E (10.22) 
Hence circular test holds in this case. 


Similarly the index number based on Kelly's fixed weight 
formula gives (without factor 100) 
ZWp, T IWp: Хүр, 
2Wp, ` ZWp, ^ Хур» 

==1. 
Remark. Generalisation of (10°21). The circular test can 

be generalised to the case of more than three periods to give : 

Pa X PX Pog X X Pa, aX Prol .-.(10.23) 


where the indices are considered without the factor 100. 


Po X Pye X Po= 


Example 10°15 From the following prove that the Fisher’s 
Ideal Index satisfies both the Time Reversal Test and the Factor Re- 
versal Test and calculate its value. 


Base year Current year 
Commodity Price Quantity Price Quantity 
A 6 50 10 56 
B 2 100 2 120 
С 4 60 б 60 
р 10 30 12 24 


[Madras Uni. B.Com., Nov. 1978 ; 
Karnataka Uni. B.Com., April 1981] 


Solution. 

COMPUTATION OF FISHER’S INDEX 
HOURS ТЗВ сл ш S Сала _., 
Соттоййу р» 4 т^ ГА Polo рә Plo Р 


А 6 50 10 56 300 336 500 560 
B 2 100 2 120 200 240 200 140 
c 4 60 6 60 240 240 360 360 
D 10 30 12 24 300 240 360 288 


Zj Хра рф piq 
=1040 —10% 21480 =1348 


ee 
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Fisher's price index number is given by; 


› р ZpidoX 3; x [1420x 1348 
Pa =100x 2140 рід =100х 4$ 
: Vz X Уру 1040 x 1056 


= 300x 4 /1914160 _ 199. VID 
1098240 
=100х 1.3202—132.02 


Time Reversal Test : We have Po" = 1.3202 


(without factor 100) 


кына 
and Ру" | sods Epoo, (without factor 100) 


1318x 1420 — V 1914160 
=v 35731451— 0.7575 


s үш 1040, /1098240 


Now 

РЁ X Ру 1-3202 x 0.7575=1.00052= 1 
Hence Fisher's index satisfies time reversal test, 
Factor Reversal Test. We have (without factor 100) 


Ont =| „Харх Хар; H 2р0 X Epig, J 
х ZqopoX Eqop, =P odo Ўр1д, 


$ са } 1423488 
=L 1040x1420 = ^l 1476800 


= У 0.963900— 0.9318 


Hence 


Po X Qu —1.3202 x .9818— 1.296 
Also 


AV, Хра 1348 
ЎР, = Урд, 71040 71-296 


5 ; Xy, 
DE Poy? X Outs 


Hence Fisher’s index Satisfies Factor Reversal Test also. 
Aliter : We have (without factor 100) 


E re [ana _. 1420x 1348 
0 N Epto X pg, 1040х 1056 
porn | рор, — /1056х 1040 
V CX Ep Хро 1348 х 1420 
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\ 


nx PSI T 1420 x 1348 x 1056 x 1040 
19 1040 x 1056 х 1348 x 1420 
—1 =! 


Hence Fisher's Index satisfies Time Reversal Test. 
2091 X ®р\д\ 1056 x 1348 


Qo f= SS а MEME VY 


EpodoX ®р\до 1040x 1420 


we Paf X Оц (002105) 1056 x 1348 
5 1040 x 1056 1040x 1420 ) 


SA (ао) = скы ыл 
1040 1040 Хр 


Hence Fisher's index satisfies Factor Reversal Test. 


Remark. lf we are not asked to compute Fisher's index but 
simply to test if it satisfies Time Reversal or/and Factor Reversal 
Tests then the alternative method given above is very convenient 
for numerical computations. 

Example 10.16. Calculate Laspeyre's, Paasche's and Fisher's 
indices for the following data. Also examine which of the above in- 
dices satisfy (i) Time reversal test, (ii) Factor reversal test. 


Current year 


Base year 


Commodity Quantity 


Solution. 


COMPUTATIONS FOR LASPEYRE'S, PAASCHE'S AND 
FISHER’S INDICES 


Commodity Po 9, pi Ф Polo pido Pon piti 


A 65 500 10:8 560 32500 5400.0 36400 60480 
B 28 124 29 148 34T2 3596 4144 4292 
Ç 47 69 82 78 3243 5658 3666 6396 
D 10:9 38 134 24. 4142 5092 2616 3216 
E 8:6 49 108 27 а214 5292 2322 2916 


41511 73638 49148 77300 
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Thus 


4757.1 
=1.5479598 x 100=154.79598 


рУ 7363.8 
j La... Dido - 
() Py — Ур, X10 x 100 


^ Zpiq; 7730.0 
Pal CUP = — 
(ii) Pa?’ pU x 100: 4914 8 x 100 
—1.5728005x 100—157.28005 
(iii) PoF= (Py, P^ X Poy) c 
=(154.79598 x 157.28005)1/2— 156.02258 
i) Quis — Po у 199 = 4914-8. 
(iv) Qa xo ups x 100 47571 X 100 
=1.0121292 x 100—101.21292 


DEPT . 7730.0 
(0) Quë’: ar X100 =7363.8 * 100 
=1.0497290 x 100=104.97290 
Wt) Qu — yY (Qu 2x Оу-у (10121 x 104.97) 
=10-06 x 10.24—103.0144 
Time Reversal Test : We should have (without factor 100) 
PuX Pyp=1 
У, 4914.8 
ii La. Рой Ern 
(vii) Py? = Уй: х100 7730.0 x 100 
=0.6358085 x 100=63.58085 
ree Z Pogo as 
(viii) P,,P"— руд х 100—758 
=0.6460115 x 100=64.60115 


(x) Py = (Р, X Pyo?*)!? = 4/63.58x 64.60) 
=7.97 x 8.04— 64.0788 


Hence 
Pot’ X P4, 0—1.5479598 x 0.6358085 =0.984205941 
PoP? X Py ??= 1.5728 x 0.6460— 1.016028841 

Py. X Pio” =1.5602258 x 0.640788=0.9997739= 1 


Hence Fisher’s formula satisfies the time reversal test. Laspe 


yre's and Paasche’s formulae de not satisfy this test. 
Factor Reversal Test. We should have : 


Рх Оһ=-УР4ї. 
ZPogo 


a 


Эра 


Y чү. 


yum 
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(х) Pyt*x 01191-54796 x 1-0121292 


=1-5667557 
(xi) Pos? X Qu? *—1.5728005 x 1.049729 =1.6510142 
(wii) PoF X Qo” = 1.5602258 x 1.030144—1.6072572 


ху 
REV. pith _ Т730__1.6249395 


quo XX Ip TL 
Hence 
Хрзду 
Zpodo 
Fisher's formula satisfies factor reversal test also. Laspeyre’s 
and Paasche's formulae do not satisfy this test. 


Example 10.17. Explain fully the concept and use of an 
Discuss the role of weighting in the construction of 


Pox X Qo," = 


indea 
index 


number. 
numbers. Describe the reversal tests for index numbers and examine 
the following formula in the light of these tests : 
G) X». X100 
3 Zqopı 
= X100 
(0 Zdopo 
T E(go- 41) Рз i 
2090—90) Pa 109 (Bombay Uni. B. Com., 1975 
(iii) E(qo+91) Po (Болу ) 
Z4/ 
(v) EY an Pr 100 
Ey 491 Po 
(Bombay Uni. B. Com., Oct. 1973) 
У by 
0) 4 bue. , УРМА x 100 
родо роду 
(Bombay Uni. В. Com., 1975) 
; Хдр 
LR. 100 
en Zgipo 
Solution. We have (without the factor 100) 
А _ _Ур\, Хро _ 2h 
(i) Pa= Epo Ро ур ‚ On 540 ` 


This is the Simple Aggregative Type of Index Number 


Zp, | ŽP ү, 


Pax Fu Zpo Zp 
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Hence the given index (Simple Aggregative Price Index) satis- 
fies Time Reversal, oe 


хе Ёр 
Also Pa X Qu =—— 
a ear x 7^ Xpuds 
Hence the given index does not satisfy Factor Reversal Test. 
Xp, Урад, Уро, 
(ii) Pa= 209, pt робат 
о радо >” Zpid д Epoo 
The given index is nothing but Laspeyre’s Price Index. 
Epio Яо x En 
Po 1^%х Po La 1 1 k 
ЗЕР Trin * 


, Hence the given index (Laspeyre’s Index) does not satisfy 
Time Reversal Test. 


Epid. Урду _, Урд 
Pot? X Ош! = E120 094 101 
01 Qo Xp, d; x "pd. BEN. 
Hence Laspeyre's Index does not satisfy Factor Reversal Test. 
(ш) P 22:09014), p = Epo (ata) 
a= Sp (gota)? "р (Fg) ` 


The given index is nothing but Marshall-Edgeworth Price 
Index number, 


Уру (49-41) Уро (41-49) 
ME МЕ. 221210 = 
di NIE Zpe(qotq)' Epi (q1--49) 


Hence Marshall-Edgeworth Index number satisfies Time 
Reversal Test. 


Qo ME= m (ро +p) 
Хдо (Pot P1) 
Уру (qy-q)., Zd1 (Pot Pr) _, Erit 
Py MEX ME— SER “Чо TG, oat PO PY, CPI 
a XOU Bu (qq) Za (рор) 7 2р 
Thus Marshall Edgeworth Index number does not satisfy 
Factor Reversal Test. 
®рух\/ Epj/ 
(iv) Py = GM. P pe ae 
POV dod: PAV 490 


The given index id is the Walsch Price Index number. 


Bux и edd PIV qog x PoV “POV dido _ 
PoV qoqi “SPV ade, V 190 


Hence Walsch Price Index satisfies Time Reversal Test. 


Og t= BAV PoP 
Zgo pops 


eT Se”! 
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Pau gura р.у 9091 DV Dory + Уруд 
poy Eqoy/ Z 
OV 404: oV рор; ‘Pogo 


os Hence Walsch Price Index does not satisfy Factor Reversal 


(v) js Xpiqo .| Zpids » ties ZPoh родо 
m= А рь рир, S pas ТЮШ 


The given index number is Fisher’s Price Index number. 


PuPx Py? = a Beste eh | Epo, EPvdo 
родо ^ ®рой\ Ep UPd 
2 4 [ Zpiqo X Epid X Уро X [Pode ] 
ZpyqoX Epoq, X Epids X Хрло 
eil. 


Hence Fisher's Index satisfies Time Reversal Test. 


onr- [2028 Хар, _ [Span , Tp. 
А Хдоро ` Хдорі Хродо ` рзд 
Zpido Zpdi хА[ Laid 


Po X Qn" = Soe 
n X Qo Урду. Uo Урду ^ pido 


e a [ EpiqoX Epid X Хра X раф ] 
Epodo X pod X Epoqo X £pido 
_ | Cra? Хр. 
(Zpoqo)* = Zpaqo 
Hence Fisher's Price Index satisfies Time Reversal Test. 


7 _ Хр _ Polo 
(n Рас Ури’ 19 Урд 


This Index is Paasche’s Price Index. 
Zig. Epod 
Py,P* x P. Pal 191 х 090 1 
2 Fi Epod: 27190 © 
Hence Paasche’s Price Index does not satisfy Time Reversal 
Test. 
Хдр 
Po "Di 
Qu^ — 5 dp 
Brig | L4 254i 
А рар) Po yee 
7 Puls ураг рь" EP 
Hence Paasche’s index does not satisfy Factor Reversal Test. 
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EXERCISE 10.2 


1. (а) Whatdo you mean by tests of “consistency for an index 
number ? 


(6) Explain Index Numbers. What tests, is a good index number ex- 
pected to satisfy ? 


2. Explain the time reversal and factor reversal tests. Examine whether 
Laspeyre's and Paasche's index numbers satisfy these tests. 


[Delhi Uni. B.A, (Econ. Hons. I), 1984] 


3. Discuss Laspeyre's, Paasche's and Fisher's index numbers. Which of 
the three would you prefer and why ? 


[Himachal Pradesh Uni. M.A. (Econ.), July 1983] 


З wns А ibe the 
4. What is meant by reversibility of an index number? Describe 

time and factor reversal tests in the theory of index numbers. Give neu 

which satisfies both these tests. [Punjab Uni. М.А. (Econ.), Oct. 


5 t is Fisher’s Ideal Index? Why is it called ideal? Show 
that it кыо beth the time reversal test as well as the factor reversal test. 


[Нітасћа | Pradesh Uni. M. Com., July 1981 ; Delhi Uni. М.В.А., 1975] 


() i "s index really an ideal index ? Give reasons in support of 
your n ee poscat [Punjabi Uni. М.А. (Econ.), 1983] 


6, istinguish between Laspeyre’s and Paasche’s index numbers. When 
will they et 7 Why is it that Fisher’s index number is called Ideal Index 
Number ? (Bombay Uni. B. Com., May 1980) 

7. What are time reversal and factor reversal tests ? State their uses. 

Test whether the index number due to Walsh given by 

ya Invia „ 100, 
E dodi 
satisfies time reversal test. (Bombay Uni. B. Com., Oct. 1973) 


8. With the usual notations the Marshall-Edgeworth index number is 
defined аз: 
Хр (4-4) 


Ep, (da +41) 
and Fisher’s ideal number is defined as : 


| 3p. Xp 

> Xx —— x 100 

Хра Epo 

Show which tests are satisfied by these formulae. 

(Bombay Uni. B. Com., 1975) 


x 100 


Ae 9.. State giving reasons, whether the following statement is true or 
Factor reversal test of Index Numbers is 
М ®р\@\ 
PaxQu- 
aX Qor 5да 


(Delhi Uni, В. Com., 1983) 
Ans. False. 


10, From the following data, find the index Number for 1970 taking 
1969 as the base and that for 1969 taking 1970 as the base and show that the 
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geometric mean makes the index reversible while the arithmetic mean does 
not. 


Commodity Price in Rs. per unit 
1969 1970 
A 20 50 
B 25 40 


[Bombay Uni. B.Com., Nov. 1980] 


Ans. Py (A.M.)=205, Py: (G.M.)=200 
Pyo (A.M.)=51:25, _ Рь(С.М.)=50 
Pa (A.M. x PiglA.M.)A1 ; Por (С.М.)х Pro(G-M.)=1 
(Without the factor 100) 


11, From the following data find the index numbers for the current Уе 
and the base year based on each other and show that the Geometric Mean 
makes it reversible but the Arithmetic Mean does not. 
ei Page eet ӘАОО ee Еа 


Prices 
^ UE Seas ee ee OE 
Commodity PRIM Catena 
A 25 55 
B 30 45 


[L.C.W.A. (Intermediate) June 1985] 
Ans. Po(A.M.)=185 ; Pa(G.M.) 2182 ; 
Py(A.M.)=56; Pu(G.M.)=55. 


z 12, From the following prove that Fisher’s Ideal Inde satisfies both the 
Time Reversible Test and the Factor Reversible Test. 


Base year Current year 
Commodity — —————— 
Price Quantity Price Quantity 
A 6 50 10 60 
B 2 100 р 2 120 
Cc 4 60 6 60 


[Delhi Uni. B. Com., (Hons.), 1983] 


13, Calculate Fisher's Ideal Index Number from the following data and 
show that it satisfies Time and Factor Reversal Tests. 


1970 1980 
Commodity 
Price Quantity Price Quantity 
A 12 100 20 120 
B 4 200 5 220 
C 8 120 12 140 
D 20 60 24 75 


[Osmania Uni. B.Com. (Hons.), April 1983] 
Ans. PoxF=141:32 
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14 What do you mean by Time Reversal Test, Factor Reversal Test 


and Circular Test. Give the list of the formulae which satisfy the above tests 
respectively, 


period with its changes in some particular fixed year called the base 
year. Fixed base indices though simple to construct have their 
limitations, some of which are outlined below : 


comparisons on the basis of fixed base indices may be unrealistic, 
unreliable and may even be misleading. 


(ii) The changes in the fashions and habits of the people, dur- 
ing the two periods (current year and fixed base year) might lead to 
new innovations and new products might have come in the market. 

Огеоуег, some of the commodities ог items which were largely 
consumed in the base year might have become outdated and may 
have to be discarded. This is not possible under the fixed base 
method as it requires the same set of commodities or items to be 
used in both the periods, 


(iii) Because of the inherent changes in the cousumption 
patterns of the people due to time lag, the relative importance of the 
various commodities in the two periods may change considerably, 
thus, necessitating a revision in the original weights, 


in which the relative changes in the level of phenomenon for any 
period are compared with that of the immediately preceding period, 


index numbers (by a suitable method) for each year with the preced- 
ing year as the base year. If Po» denotes the price index for current 
period ‘b’ with respect to the base period ‘a’, then we compute series 


Pou=First Link 
Рь=Р, $i X P, 12 
Poy— (Poi X Pye) X P4 Py X Pg +++(10-24) 


Py. х Peas 
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Thus the steps in the construction of the’ chain base index 
numbers may be summarised as follows : 


1. For each commodity, express the price in any yearasa 
percentage of its price in the preceding year. This gives the link 
relatives (L.R.). Thus 


L.R. for period i= Ее x100, (i1, 2,...г) 2010.25) 
ed E 


DADA Chain base indices (С.В.1.) are obtained on multiplying the 
link relatives successively as explained in (10 24). Thus 


С.ВЛ.бог any year= Current year LR. х Preceding year C.B.I. 


2.10.26) 


Remarks 1. Obviously, the techniques of computing the index 
number by the ‘fixed base’ and the ‘chain base’ methods are diffe- 
rent, the former (Е.В.І.) using the original (raw) data while the 
latter (C.B.1.) using the Link Relatives. 


If there is only one series of observations, i.e., if we are given 
the prices (quantities) of only one commodity (item) for different 
years, then the fixed base indices and the chain base indices will 
always be same [See Example 10.22). Hence in such a case we 
should always use the fixed base method since it requires much less 
calculations as compared with chain base method, 


However, if there are more than two series then the chain base 
indices and fixed base indices would usually be different except for 
the first two years, for which they will always be equal (See Example 
10.24]. 


2. Conversion of Chain Base Index Numbers to Fixed Base In- 
dex Numbers. Fixed base index (F.B.I.) numbers сап be obtained 
from the chain base index (C.B.L) numbers by using the following 
formula : 


.B.L. X i F.B.I. 
Current Year F.B.L= Current year C Be Previous year 


(10.27) 


the F.B.I. for the first period being same as the C.B.I. for the first 
period. 


10.7.1. Uses of Chain Base Index Numbers. (1) In the chain 
base method the comparisons are made with the immediate past 
(preceding year) and accordingly the data (for the two periods 
being compared) are relatively homogeneous. The comparisons are, 
therefore, more valid and meaningful and the resulting index is more 
representative of the current trends in the tastes, habits, customs 
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and fashions of the society. Hence, the chain base indices are 
specially useful to a businessman who is basically interested in com- 
Parison between the values of a phenomenon at two consecutive 
periods rather than the values of the phenomenon at any period with 
its valve in some distant fixed base period. 

According to Mudgett the chain base method gives better 
account of the dynamics of the transition from base year to given 
year than other methods. 

2. Inthe chain base method new commodities or items may 
be included and the old and obsolete items may be deleted without 
impairing comparability and without requiring the recalculation of 
the entire series of index numbers, which is necessary in case of fixed 
base method. Moreover, the weights of the various commodities 
can be adjusted frequently. This flexibility greatly increases the 
utility of the chain indices over the fixed base index numbers. 
According to Marshall and Edgeworth chain base indices are the 
best means of making short-term comparisons, 


Significance of these indices, and give physical interpretations to 
шеп Example 10.18. Convert the following fixed base index numbers 
into chain base index nnmbers : 
Year : 1970 1971 1972 1973 1974 1975 
Е.В... 376 392 408 380 392 400 
[Punjab Uni. B. Com., April 1977 s 
Solution. Punjab Uni. В.А. (Econ. Hons.), 1982] 
CONVERSION OF F.B.I. TO C.B.I. 


Link Relatives Chain Index 


376 


392 7 : 

1971 376 100=10426 27610426 м 
408 Г 

1972 39210010408 292510008 —408 
380 ; 408 x 93:14 

1973 "08* 100—93:14 100 =380 
392 А 

1974 380 100—103:16 280510216 ag, 

M00 S 392 x 102:04 

392* 100—102:04 100 =400 
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Remark. It may be noted that the chain base indices are 
same as the fixed base indices. In fact this will always be true for a 
single fized series of index numbers. 


Example 10 19. From the chain base index numbers given below, 
find fixed base index numbers : 


Year : 1975 1976 1977 1978 1979 
Chain base index : 80 110 120 90 140 


[Delhi Uni. B. Com., (Hons.), 1981 ; 
Madras Uni. B. Com., Sept. 1978) 
Solution. Using formula (10.27), viz., 


Current year F.B.I. 
Current year C.B.I. X Previous year F.B.I., 
Es 100 


the first year F.B.I. being same as first year C.B.I., we obtain the 
F.B.I. numbers as given in the following table. 


CONVERSION OF C.B.I. NUMBERS TO F.B.I. NUMBERS 


Chain Index Number 


Fixed Base Index Number 


80 80 
1976 Tus 1% зв 
1977 DI 105:60 
m ge cere 

95:04% 140 _ 133.96 


Example 10. 20. From the following prices of three groups 
of commodities for the years 1973 to 1977, find the chain base index 
numbers chained to 1973 : 

Groups 1973 1974 1975 1976 1977 


I 4 6 8 10 12 
П 16 20 24 30 36 
Ш 8 10 16 20 24 


(М.Р. (Rohtak) Uni. B.Com., Sept. 1978 ; 


Solution. See Page 570. Kurukshetra Uni. B.Com., April 1978} 
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EXERCISE 10.3 
1. (а) What are the limitations of the fixed base index numbers ? 


(b) What are the Chain Base Index Numbers? How are they constru- 
cted ? What are their uses ? 


(c) Discuss the advantages of chain indices over fixed base indices. 
Also state their limitations ? 


4 2. (a) Distinguish between ‘Fixed’ and ‘chain’ base indices, Give a 
suitable illustration to show the difference. 
[Delhi Uni, B.Com. (Hons.) 1979] 


(6) Distinguish between fixed base and chain base index numbers, What 
are their relative merits and demerits ? 


[Punjab Uni. B.A.(Econ. Hons. II), April 1983; 


: 3. From the fixed base index numbers given below, find out chain base 
index numbers : 


Year H 1976 1977 1978 1979 1980 1981 
Index No. : 200 220 240 250 280 300 

[Guru Nanak Dev Uni. B. Com. II, Sept. 1982] 
Ans. 200, 220, 240, 250, 280, 300 : 
4. Convert the following series of index numbers to chain base 

indices : 
Year : 1960 1961 1962 1963 1964 1965 1966 1967 
Index No 
(Base 1960) : 100 110 125 133 149 139 150 165 
[Bombay Uni. B. Com., May 1982] 

Ans. 100, 110, 125, 133, 149, 139, 150, 165 


5. Convert the following link relatives into price relatives, taking 1975 
as the base : 
Year : 1975 1976 1977 1978 1979 1980 


Link Relatives : 120 150 180 225 270 324 
[Delhi Uni. B. Com., ПІ, 1984] 


Ans. 120, 180, 324, 729, 1968, 6376 


6. Construct chain index numbers from the link relatives given 
below : 
Year SON 1959 1970 1971 1972 1973 
Link index : 100 105 95 115 · 102 
[Delhi Uni. B. Com. (Hons.), 1974] 
Ans. 100, 99775, 11471, 117 
7 From the fixed base index numbers given below obtain chain base 
index numbers. 
Year : 1963 1964 1965 1966 1967 1968 
Index No. : 150 180 120 120 80 96 
[Punjab Uni. B.A. (Econ. Hons.), 1980] 
Ans. 150, 180, 120, 120, 80, 96 


572 Business Statistics 


8. From the chain base index numbers given below, prepare fixed base 
index numbers, 


Year : 1954 1955 1956 1957 1958 
Index No.; 90 110 115 120 130 

[Punjab Uni, B.A. (Econ. Hons.), 1980] 
Ans. 90, 99, 113-85, 136-62, 177-61 


p 9. From the chain base index numbers given below, prepare fixed base 
index numbers, 


Year : 1971 1972 1973 1974 1975 
Index No. : 110 160 140 100 150 

[Delhi Uni. B. Com. (Hons.), 1976] 
Ans. 110, 176, 246:4 492:8, 7392 


10. Prepare fixed base index numbers from the chain base index num- 
bers given below : 


Year : 1971 1972 1973 1974 1975 1976 

IndexNo.: 92 102 104 98 103 101 
[Punjab Uni. B.A. (Econ. Hons.), 1982] 

Ans. 92, 93:84, 97-59, 95°64, 98:51, 99:50 

11. From the following Chain Base Index Numbers calculate Fixed Base 


Index Numbers : 
y 1971 1972 1973 1974 1975 1976 
80 95 102 98 105 100 


[Himachal Pradesh Uni. B. Com., April 1982] 
Ans. 76, 77°52, 75:97, 79:77, 79-77 


_ 12. From the following annual average prices of three commodities 
given In rupees per unit, find chain index numbers based on 1977: 


(Guru Nanak Dev Uni. B.Com. II, April 1982) 
Ans. 100, 131-67, 166-05, 204-79, 212:36 


13. Assuming that all the goods can be assigned equal weights, calculate 
the chain base index numbers for the years 1976 to 1980 on the basis of the 
following price relatives : 


[Price Relative current year's price ent year’s price x 100] 


t year's price 
Good А GoodB ^ Good C Good D Good E 
1976 100 100 100 100 100 
1977 90 125 134 118 133 
1978 89 61 60 115 125 
1979 112 200 80 93 140 
1980 122 66 150 86 86 


Ans. 100, 120, 108, 135, 137-7 [Delhi Uni. B.A. (Econ. Hons. II), 1983} 
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14. Calculate the chain base index numbers from the data given below : 


Year Price of Commodities 
(in rupees) 

A B с р Е 
1966 10 20 12 40 100 
1967 12 22 14 45 110 
1968 11 25 18 49 106 
1969 14 28 10 43 102 
1970 15 23 9 42 101 


[Punjab Uni. B.A. (Econ. Hons. 11), April 1981} 
Ans. 100, 113:83, 122°74, 117:54, 111-88 


10.8. Base Shifting, Splicing and Deflating of Index Numbers 


10.8.1. Base Shifting. Base shifting means the changing of 
the given base period (year) of a series of index numbers and recast- 
ing them into a new series based on some recent new base period, 
This step is quite often necessary under the following situations : 


(i) When the base year is too old or too distant from the 
current period to make meaningful and valid comparisons. As 
already pointed out [Selection of Base Period $ 10.4], the base year 
should be normal year of economic stability not too far distant from 
the given year. 

(ii) If we want to compare series of index numbers with 
different base periods, to make quick and valid comparisons both 
the series must be expressed with a common base period. 


pase shifting requires the recomputation of the entire series of 
the index numbers with the new base. However, this is a very 
difficult and time consuming job. A relatively much simple, though 
approximate method consists in taking the index number of the new 
base year as 100 and then expressing the given series of index 
numbers as a percentage of the index number of the time period selec- 
fed as the new base year. Thus, the series of index numbers, recast 
with a new base is obtained by the formula : 


Recast I. No. of any year 
Old I. No. of the year ..(10.28 
= Т. No. of new base year EIN 50929) 


100 
-( тогаз ауа) 0 I. No. of the year) 
(10.28) 


In other words, the new series of index numbers is obtained or 
multiplying the old index numbers with a common factor : 


100 
I. No. of New Base Year 


The technique is explained below by numerical illustrations. 
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Remark. Rigorously speaking the above method is applicable 
only if the given index numbers satisfy the circular test [e.: Index 
Number based on Kelly's fixed base method or simple geometric 
mean of price relatives]. However, most of the index numbers based 
on other methods also yield results, which are practically, quite 
close to the theoretically correct values. ; 

Example 10°21. Reconstruct the following indices using 1980 
as the base. 


Year : 1976 1977 1978 1979 1980 1981 1982 
Index Nos. : 110 130 150 175 180 200 220 
(Delhi Uni. B.Com., 1983) 
Solution. 
INDEX NUMBERS (BASE 1980—100) 


Year Index No. Index Number 
(Base 1980= 100) 
ee ROS ЖҮЗ Н 

1976 110 E x110—61-11 

1977 130 E x 130-7222 

1978 150 m X 150—83:33 

1979 175 m x 175=97:22 

1980 180 100-00 

1981 200 10 х200=111°11 

1982 220 E x220—12222 


10.8.2. Splicing. An application of the principle of base 
shifting is in the technique of splicing which consists in combining 
two or more overlapping series of index numbers to obtain a single 
continuous series. This continuity of the series of index number is 
required to facilitate comparisons. Let us suppose that we have а 
series of index numbers with Some base period, say, *a' and it is dis- 
continued in the period ‘b’ and with the terminating period of the 
first series as base, i.e., period ‘b’ as base, a second series of index 
numbers (with the same items) is constructed by the same method 
(formula). In order to Secure continuity in comparisons the two 
Series are put together or Spliced together to get a continuous series. 
The method is explained below : 
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SPLICING OF TWO INDEX NUMBER SERIES 


Year Series I Series II Series II Series E 
Spliced to Spliced to 
Base ‘a’ Base ‘h’ Series I Series II 
(Base *a") (Base Б) 
a 100 100 100 x 100 
ar 
а+1 а а 100 ха 
а, 
а+2 аз аз 100 xa; 
а, 
b-i ад ara доо Kapa 
а, 
b ar 100 ay 100 
ъ+1 bi Ag xh bi 
| b+2 b, Au xb; b, 
b43 b, dig "b: bs 


Explanation. When series II is spliced to series I to get a 
continuous series with base ‘a’ 100 of II series beeomes ar 


= — b, of II series becomes duy Xby 


ar 
100 
and so on. Thus multiplying each index of the series II with 


constant factor, we get the new series of index numbers spliced 


and bof II series becomes X ba 


to series 1 (Base ʻa’). In this case series I is also said to be spliced 
| forward. 


If we splice series I to series II to get a new continuous series 
with Base ‘b’ then, 
ar of 1° series becomes 100, 


s А 100 
=” ar_ of I* series becomes ae Xara, 


а 100 
a, of I* series becomes 77 Ха,, 
Ld 
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and soon. Thus, the new series of index numbers with series I 
spliced to series II (Base ‘b’) is obtained on multiplying each index 
of series I by new constant factor (= ). In this case we also say 


that series II is spliced backward. We give below some numerical 
illustrations to explain the technique. 


Example 10 22. Given below are two price index series. Splice 
them on the base 1974=100. By what per cent did the price of steel 
rise between 1970 and 1975 ? 


Year Old price index for Steel New price index for Steel 


Base (1965— 100) Base (1974 — 100) 
1970 141.5 
1971 163.7 
1972 158.2 
1973 156.8 99.8 
1974 157.1 100.0 
1975 102.3 


(Delhi Uni. B.A. (Econ. Hons.), 19791 
Solution. 


SPLICING OF OLD PRICE INDEX TO NEW PRICE INDEX 
ee eR Sees т 


Year Old price index for Steel New price index for Steei 
Base (1965=100) Base (1974—100) 
100 
1970 141°5 1571* 14i:5= 90:07 
3 100 i : 
1971 163-7 1571 * 163'7=104-20 
. 100 б б 
1972 ` 158-2 TISTI х 158°2=100-70 
100 
1973 156° те "Ba 90- 
56°8 1571 x156'8= 90-81 
1974 157-1 100°0 
1975 102:3 


e aea aa КО Ыы SS 


Hence the percentage increase in the price of steel between 
1970 and 1975 is 


102-30 —90.07 
79007; *100=0.1358 x 100—13.58 


Hence required increase is 13.58%. 


Remark. When the old index is spliced to the new index 
(Base 1974), the index number for 1974, viz., 157.1 becomes 100. 


Hence the multiplying factor for splicing is 7 =0.6365 


AS ret —————! Na 


Ка 
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Example 10.23. Jn 1920, a Statistical Bureau started an index 
of production based on 1914 with the following results 
Year 1914 (Base) 1920 1929 
Index 100 120 200 
Jn 1936. the Bureau reconstructed the index on a plan with base 
1929. 
Year 1929 (Base) 1935 
Index 100 150 


In-1936, the Bureau again reconstructed the index оп yet ап- 
other plan with the base year 1935. 


Year 1935 (Base) 1939 1943 

Index 100 126° 150 

Obtain a continuous series with the base 1935, by splicing the 
three series. (Bombay Uni B Com., Oct 1976) 


Solution First of all we shall splice the first series (Base 
1914) to the second series (Base 1929). In doing so the old. index 
number for 1929, viz., 200 becomes 100. Hence the multiplying 


1 
200 
Then we splice with new continuous series (Base 1929) to the 


third series (Base 1935) Here the old index number of 1935, viz., 
150 becomes 100 Hence the multiplying factor for splicing is 


100 


factor for splicing is Bb =0.5. 


180^ 0.6667. 
Year First First series Ist two series 
series spliced to second spliced to third series 
(Base 1914) (Base 1929) (Base 1935) 
100 : 100 
1914 100 200 х 100=50 Tsp * 50=33-33 
100 100 
1920 120 200 * 120=60 150 х 60—40:00 
1929 200 100 150 100-6667 
1935 150 100 
1939 120 
1943 150 


Kur oo ste E SR c ce eee 
10.8.3. Deflating of Index Numbers. Deflating means adjust- 

ing, correcting or reducing a value which is inflated. Hence by 
deflating of the price index numbers we mean adjusting them after 
making allowance for the effect of changing price levels. This is 
particularly desirable in the case of an economy which has infe 
tionary trends because in such an economy, the increase in the 
prices of commodities or items over a period of 
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in their real incomes (which is defined as the purchasing power of 
money), and accordingly a rise in their money income or nominal 
income may not amount to a rise in their real income. Thus, it 
becomes necessary to adjust or correct nominal wages in accordance 
with the rise in the corresponding price index to arrive at the real 
income. The purchasing power is given by the reciprocal of the 
index number and consequently the real income (or wages) is obtai- 
ned on dividing the money or nominal income by the correspon- 
ding appropriate price index and multiplying the result by 100. 


Symbolically, 


_ Money or Nominal Wages 
Real Wages Крнета - 077 100 -..(10.29) 


The real income is also known as deflated income. 


This technique is extensively used to deflate value series or 
value indices, rupee sales, inventories, income, wages and so on. 


Example 1024. The table below shows the average wages in 
rupees per day of a group of industrial workers during the year 1960- 
1971. The consumer price indices for these years with 1960 as base 
year are also shown. 


Determine the Real Wages of the workers during the years 1960- 
1971 as compared with their wages in 1960. 


Year 
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 


Averdge wage of workers 
1.19 133 1-44 157 1.75 1.84 1.89 1.94 1.97 2.13 2.28 2.45 


Consumer price index 
100 1% 106.6 107.6 116.2 118.8 119.8 120.2 119.9 121.7 125.9 


(1960= 100) 


Determine the purchasing power of the Rupee Sor the year 1971 
as compared to the year 1960. What is the significance of this 
result ? U.C.W.A. (Final) Dec., 1977] 


Solution. Real wages are obtained on dividing the average 
ШЕ» by the corresponding index number and multiplying by 


Index Numbers 579 


COMPUTATION OF REAL WAGES 
a a жес м 


Year они аа Real «s of workers 
D (1960=100)(2) = 70у 1» 
COR le ЕНЕ XP MeL CARERE S o n 
1960 1-19 100 e x 100—119 
1961 1-33 107°6 25. х100= 1:24 
1962 1-44 106-6 ine х100=1:35 
1963 157 1076 Ta 100-146 
1964 175 1162 e x 100—1:51 
1965 1:84 118:8 ug 10-155 
1966 1:89 119-8 EDS 100—1:58 
1967 (0194 120:2 Ar x 100—1:61 
1968 1:97 119-9 чуу *100=1°64 
1969 2:13 121:7 ZB x 100-175 
1970 228 1259 A x 100.—1:81 
1971 2:45 129-3 25. х 1001-89 


The purchasing power of the rupee in any year as compared 
to the year 1960 is given by the reciprocal of the corresponding 
consumer price index. 


Hence the purchasing power of rupee in 1971 as compared to 
. 100 
the year 1960 is 93 077. 


This implies that in 1971 we have to spend Re. 1 for buying 
a commodity which cost 77 paise in 1960. This means that although 
the average wage of the worker in 1971 is more than double his 
wages in 1960, in fact he is not better off than in 1960 since the 
Purchasing power of the rupee has in reality, sloped to Rs. 0°77, 
i.e., seventy-seven paise only. 
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Example 10 25. The employees of the Australian Steel Ltd. 
have presented the following data in support of their contention that 
they are entitled to a wage adjustment. Dollar amounts shown re- 
present the average weekly take-home pay of the group : 


Year - 1973 1974 1975 1976 
Рау 5 $260.50 $263.80 $274.00 $282.50 
Index: 126.8 129.5 136-2 141.2 


(a) Compute the real wages based on the take-home pay and 
the price indices given, 


(6) Compute the amount of pay needed in 1976 to provide buy- 
ing power equal to that enjoyed in 1973. 
(Punjab Uni. B. Com., April 1978) 
Solution. (a) 


COMPUTATION OF REAL WAGES 
: 
Year Pay 


Price Q) 
Wages- ==- х 100 
(in dollars) Index Real Mores (3) 
а) (2) (3) (4) 
ЕБЕР UNO заа эсш м очы. 
1973 260:50 126-8 25030. 10020544 
263-80 
1974 263-80 129-5 DS. x 100=203-71 
s 274-00 : 
1975 274-00 1362 3692 х100—20117 
1976 282:50 1412 282150 


-arz * 100=200:07 


(b) In order that employees of the Australian Steel Ltd. have 


the same buying capacity in 1976 as in 1973, their pay in 1976 
should be 
260.50 


1268 * 141-2—205.44 x 141.2—290.08 Dollars. 


EXERCISE 10.4 
1. What is ‘base shifti 


ng'? Why does it become necessary to shift the 
base of index numbers ? Give an example of the shifting of base of index 
numbers. 
(Rajasthan Uni, M. Com., 1977) 
2. The following are price index numbers (Base 1965=100) 


Year : 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 
Index No.: 100 120 122 116 120 120 137 136 149 156 137 
Shift the base to 1970 and recast the index numbers. 


Ans, aa 10167, 96:67, 100, 100, 11417, 113-33, 124-17, 130700, 
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3. The following are the index numbers of wholesale prices of a certain 
commodity based on 1972 : 

Year : 1972 1973 1974 1975 1976 

Index No. : 100 108 120 150 210 


Shift the base to 1974 and obtain new index numbers. 
(Kurukshetra Uni. B.Com., Sept. 1980) 


Ans. 83:33, 90, 100, 125, 175 
1963 4. Inthe following series of index numbers shift the base from 1960 to 


Year : 1960 1961 1962 1963 1964 1965 1966 1967 
Index No. : 100 105 110 125 135 180 195 205 
(Lucknow Uni. B.Com., 1982) 


Ans. 80, 84, 88, 100, 108, 144, 156, 164 
5. The first series of index was started in 1910 and continued up to 1944 


in which year another series of index was started. Splice the two series together 
so as to give a continuous series with base 1910=100, 


Year Ist Series 2nd Series 
1910 100 

1911 120 

1912 130 

1943 150 

1944 160 100 
1945 120 
1946 150 
1947 140 


Ans. 100, 120, 130,...,150, 160, 192, 240, 224. 


6. А firm in а certain industry has an index of material prices based on 
movements in the prices of selected materials weighted by the quantities consu- 
med in the base year. The price index series based on 1950=100, for the 
years 1960—1965 was as follows : 

1960 1961 1962 1963 1964 1965 

120:3 1221 126°4 125-2 1270 1316 

In 1965, the index was completely revised to take into account a change 
in the type of materials used. The new index, based on 1965=100, showed the 
following values : 

1965 1966 1967 
100 106:3 109-4 

(i) Splice the new index to the old, i.e., splice ‘forward’. 

(ii) Splice the old index to the new, i.e., splice backward. 

Ans. (a) 1966 1967 (b) 1960 1961 1962 1963 1964 

1399 144 91:4 920 960 95:1 5:596. 


7f. What is meant by (i) base shifting, (ii) splicing and deflating of 
index numbers ? Explain and illustrate. 

8. Explain how index number is used to measure the purchasing power - 
of money. [LO.W.A. (Final), Dec. 1981) 
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(b) What do you understand by deflating of index numbers? Illustrate 
E ono tke hele of aa Seam [Delhi Uni. B.Com., (Hons.) 1983] 


9. Mean monthly wages (x) and cost of living index numbers (y) for the 
years 1970 to 1975 are given below : 


Year : 1970. 1971 1972 1973 1974 1975 

et Let ш cele OUI e UR TRE ID XU n LLL 
Rs. x: 360 400 480 520 550 590 
y: 100 104 115 160 210 260 


In which year the real income was (i) the highest, (i) the lowest ? 
cns (Bombay Uni. B. Com., Nov. 1980) 


Ans. (i) 1972, (Її) 1975. 
10. The annual wages (in Rs.) of workers are given along with price 
indices. Find the real wage indices : — 
Year : 1970 1971 1972 1973 1974 1975 1976 
Wages (Rs.) : 180 220 340 360 365 370 375 
Price indices; 100 170 300 320 330 340 350 
[Kurukshetra Uni. В. Com., 1981] 
Ans. 100, 71:90, 62:97, 63:51, 61:45, 60:46, 59:53 
11. The following data relate to the income of the people and General 
Index Number of Prices of a certain region. Calculate — 
(i) Real income, and 
(ii) Index numbers of Real Income with 1973 as base : 
Year $1973 1974 1975 1976 1977 1978 1979 
Income (in Rs): 800 819 825 876 920 938 924 
General Price 
Index Number : 100 105 110 120 125 140 140 
(Punjab Uni. B.Com., Sept. 1980) 
Ans, 


Real Wages : 800, 780, 750, 730, 736, 670, 660 
I. No. of Real wages : 100, 97:5 93:75 91-25 92, 83°75, 82°5 


12. From the following table showing the monthly wages of workers 
from 1970 to 1977, construct the Real Wages Index Numbers. 


Year : 1970 1971 1972 1973 1974 1975 1976 1977 


Monthly 

Wages (Rs): 150 170 200 250 320 360 380. 400 
Price 

Index : 100 150 250 300 350 380 350 360 


(Andhra Pradesh Uni. B.Com., April 1982) 
Ans. 100, 75:56, 53:34, 55:56, 60°96, 63:16, 72:38, 74:08 


13. Given below are the average wages in rupees per hour for 
unskilled workers of a factory during the years 1975-80. Also shown are Consu- 
mer Price Indices for these years (taking 1975 as base year with Price Index 
100). Determine the real wages of workers during 1975-1980, compared with 
their wages in 1975. 


каз + 
M 
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Year E 1975 1976 1977 1978 1979 1980 
Consumer Price 
Index Е 100 1202 121:7 11259 129-3 140 
Average Wage 
(Rupees, hour) : 1:19 1:94 2:13 2:28 2:45 3-10 


How much is the worth of one rupee of 1975 in subsequent years ? 
(.C.W. A. (Final), June 1982) 
Ans. Realwages(Rs.) 1:19, 1°61, 1:75, 1°81, 1:895, 2°21 


Е Aiea power of rupee in 1980 аѕ compared to the year 1975 is 
s. O° 


10.9. Cost of Living Index.Number. The wholesale price 
index numbers measure the changes in the general level of prices 
and they failto reflect the effect of the increase or decrease of 
prices on the cost of living of different classes or groups of people 
in a society. Cost of living index numbers, also termed as ‘Consumer 
Price Index Numbers, or ‘Retail Price Index Numbers’ are designed 
to measure the effects of changes in the prices of a basket of goods 
and services on the purchasing power of a particular section or class 
of the society during any given (current) period w.r.t. some fixed 
(base) period. They reflcct upon the average increase in the cost of. 
the commodities consumed by a class of people so that they can 
maintain the same standard of living in the current year as in the 
base year. Due to the wide variations in the tastes, customs and 
fashions of different sections or classes of people, their consumption 
patterns of various commodities also differ widely from class to 
class or group to group (like poor, lower income group, high income 
group, labour class, industrial workers, agricultural workers) and 
even within the same class or group from region to region (rural, 
urban, plane, hills, etc.). Accordingly, the price movements affect 
these people (belonging to different class or group or region) differ- 
ently. Hence, to study the effect of rise or fall in the prices of 
various commodities consumed by a particular group or class of 
people on their cost of living, the ‘cost of living’ Index Numbers 
are constructed separately for different classes of people or groups 
or sections of the society and also for different geographical areas 
like town, city, rural area, urban area, hilly area and so on. 


Remark. It should be clearly understood that the cost of living 
index numbers measure the changes in the cost of liying or 
purchasing power of a particular class of people due to the move- 
ments (rise or fall) in the retail prices only. They do not measure 
the changes in the cost of living as a consequence of changes in the 
living standards. The cost of living index numbers should not be 
interpreted as a measure of ‘Standard of Living’. Cost of living 
index numbers are based on (retail) prices and price is a factor which 
affects the purchasing power of the class of the people. But price 
of the commodities or consumer goods is only one of the various 
factors on which the standard of living of people depends, some 
other factors being family size, its age and sex-wise composition, 
its income and occupation, place, region, etc., none of which is. 


— 


584 Business Statistics 


| taken into account while computing the cost of living index number. 
Accordingly, the Sixth International Conference of Labour Statis- 
ticians held under the auspices of the International Labour Organis- 
ation (LL.O.) in 1949 recommended the replacement of term ‘Cost 
of Living’ index by a more appropriate term *'Consumer Price Index’ 
or ‘Retail Price Index’. 


10.9.1. Main Steps in the Construction of Cost of Living 
Index Numbers 

(a) Scope and Coverage. As in the case of any index 
number, the first step in the construction of cost of living index 
numbers is to specify clearly the class of people (low income, high 
income, labour class, industrial worker, agricultural worker, etc.) 
for whom the index is desired. In addition to the class of people, 
the geographical area such as rural area, urban area, city or town, 
or a locality of a town, etc., should also be clearly defined. The 
class should form, as far as possible, a homogeneous group w.r.t. 
income. 


Remark. As already pointed out, the cost of living index is 
intended to study the variations in the cost of living (due to the 
Price movements) of a particular class of people living in a particular 
Tegion. For example, we can’t construct a single cost of living 
index number for, say, low income class for the whole country 
because there is wide variation in the retail prices of commodities 
and the consumption pattern of this class of people in different 
regions (states) of the country. Thus, the relative importance of 
different commodities will be different in different regions. For 
example, in Bengal rice and fish are relatively more important as 
Compared with wheat and meat. Accordingly the ‘class of people’ 
together with their region or place of stay should be clearly 
Specified. 


у (6) Family Budget Enquiry. After step (а), the next step 
18 to conduct a sample family budget enquiry. This is done by 
selecting a sample of adequate number of representative families 
from the class of people for whom the index is designed. The 
enquiry should be conducted ina normal period of economic 
stability. The objective of the enquiry is to find out the expenses 
which an average family (of the given class) incurs on different items 
of consumption. The enquiry furnishes the information on the 
following points : 


1. The nature, quality and quantity of the commodities consum- 
ed by given class of people. 

The commodities are broadly classified into the following five 
major groups. 

(i) Food, (й) Clothing, (iii) Fuel and Lighting, (iv) House 
Rent, and (v) Miscellaneous. 


Each of these major groups is further subdivided into smaller 
groups termed as sub-groups. For instance, the group Food’ may 
be sub-divided into cereals (wheat, rice, pulses, etc.) ; meat; fish and 


Ies 


=== 
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poultry; milk and milk products; fats and oils ; fruits and vege- 
tables; condiments and spices; sugar ; non-alcoholic beverages ; 
pan, supari and tobacco, etc. Similarly ‘Clothing’ may cover cloth- 
ing, bedding, foot-wear, etc. The last item ‘Miscellaneous’ includes 
items like medical care, education and reading, amusement and 
recreation, gifts and charities, transport and communication, house- 
hold requisites, personal care and effects and so on. It, however, 
does not include non-consumption money transactions such as 
payments towards provident fund, insurance premiums, purchase of 
savings certificates and bonds, etc. 


The procedure of selection of commodities for the construction 
of the index has been discussed in detail in § 104. Care should 
be taken to include only those items or commodities which are 
primarily consumed by the given class of people for whom the 
index is to be constructed. 


2. The retail prices of different commodities selected for the 
index. The price quotations for the selected commodities should be 
obtained from ‘local markets’ where the class of people reside or 
from super bazars, fair-price shops or co-operative stores or depart- 
mental stores from where they usually do their shopping. [For 
details see § 10.4] 


3. -From the prices of the commodities and their quantities 
consumed, we can obtain : 


(i) The expenditure on each item (in a group) expressed as a 
ratio of the expenditure on the whole group, and 


(ii) The expenditure on each group expressed as a proportion 
of the expenditure on all the groups. 


10.9.2. Construction of Cost of Living Index Numbers. As 
already pointed out, the relative importance of different items of 
consumption is different for different classes or groups of people 
and even within the same class from region to region. Accordingly, 
the cost of living indices are obtained as weighted indices, by taking 
into consideration the relative importance of the commodities which 
is decided on the basis of the amount spent on various items. The 
cost of living index numbers are constructed by the following 
methods : 


G) Aggregate Expenditure Method or Weighted Aggregate 
Method. In this method, the quantities consumed in the base year 
ave used as weights. Thus in the usual notations : 


Cost of Living Index Pf x 100 --.(10.30) 


Podo 
Total expenditure in current year 
ear Se STL DRS DPE ered 100, 
Total expenditure in base year 
total expenditure in current year is obtained with base year quan- 
tities as weights. 


Formula (10:30) is nothing but Laspeyre’s price index. 
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(ii) Family Budget Method or Method of Weighted Relatives. 
In this method the cost of living index is obtained on taking the 
Weighted average of price relatives, the weights being the „values of 
the quantities consumed in thebaseyear. Thus, if we write 


I—Price Relative e X100 and И=роф, 
0 
then 


Не хр 
Cost of Living Index= *W ... (10.31) 


Substituting the values of W and J we get 


> pad Ps 100 ) 
xn { 
Cost of Living Index= 35 


== Pido х 100 
= pogo 
which is same as (10.30). 
Remark. Thus we see that the cost of living index numbers 
obtained by both the methods are same. 


10:9.3. Uses of Cost of Living Index Numbers 


1. Cost of living index numbers are used to determine the 
purchasing power of money and for computing the real wages 
(income) from the nominal or money wages (income). We have : 

1 

Index Number 
Money Wages 

Cost of Living Index 196 


Thus, cost of living index number enables us to find if the real 
wages are rising or falling, the money wages remaining unchanged. 


2 The government (Central and/or State) and many big 
industrial and business units use the cost of living index numbers 
to regulate the dearness allowance (D.A.) or grant of bonus to the 
employees in order to compensate them for increased cost of living 
due to price rise. They are used by the government for the formula- 
tion of price policy, wage policy and general economic policies. 


3. Cost of living indices are used for deflating income and 
value series in national accounts. [For details see § 10.8—Deflation 
of Index Numbers]. 


4. Cost of living index numbers are used widely in wage 
negotiations and wage contracts. For example, they are used for 
automatic adjustment of wages under ‘Escalator Clauses’ in collec- 
tive bargaining agreements. Escalator clause provides for certain 
point automatic increase in the wages corresponding to a unit 
increase in the consumer price index. 


Purchasing Power of Money = rost UU 


Real Wages— 
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Example 10.26. Construct the cost of living index number from 
the table given below :— 


Group Index for 1973 Expenditure 
1. Food 550 46% 
2. Clothing 215 10% 
3. Fuel & Lighting 220 7% 
4. House Rent 150 12% 
5. Miscellaneous 275 2595 


[Karnataka Uni. B.Com., April 1982 ; Nagarjuna Uni. B.Com., Oct. 
1981 ; Bangalore Uni. B.Com., April 1978] 


Solution. 


COMPUTATION OF COST OF LIVING 
INDEX NUMBER 


——————————— 


Group Index (J) Expenditure (W) WI 
en 

Food 550 46 25300 

Clothing 215 10 2150 

Fuel and Lighting 220 7 1540 

House Rent 150 12 1800 

Miscellaneous 215 25 6875 

ZW-100 ZWI-37665 
Tuer LZWI 37665 
ng in ber= => 376° 
Cost of living index num! zw 100 376:65 


Example 10.27. In the construction of a certain Cost of Living 
Index Number, the following group index numbers were found. Calcu- 
late the Cost of Living Index Number by using (i) the weighted 
arithmetic mean, and (ii) the weighted geometric mean. 


Group Index Numbers Weights 
1. Food 350 5 
2. Fuel and Lighting 200 Ш 
3. Clothing 240 i 1 
4. House Rent 160 1 
5. Miscellaneous 250 2 


(Bombay Uni. B. Com., 1976) 


2 Solution. Business Statistics 
COMPUTATION OF CONSUMER PRICE INDEX BY 


A.M. AND G.M. 
Pr hae ЪЗ ЗОТИ АНЕ 
Group Index Number (I) Weights(W) WI 1б 1  Wlogl 


ea ee ee IN a EE А EI ED ENS E 


Food 350 5 1750 2:544 12:7205 
Fuel and Lighting 200 1 200 2:3010 :3010 
Clothing 240 1 240 23802 23802 
Ноизе Кепї 160 1 160 2°2041 2:2041 
Miscellaneous à 250 2 500 23979 47958 
co QUERER Porn Tr Het NI ОО у жу эы от. о 
=W=10 IWI ZW log I 
=2850 —24:4016 
The consumer price index using Arithmetic Mean is 
EWI 2850 
Po (АМ)= ў 10 =285 


Using Geometric Mean, the consumer price index is given by ; 
log Po, (G.M.) - ——— = 72401 


Po (G.M.)=Antilog (2.4401) —275.4 


Example 10.28. Calculate the Cost of Living Index Number 
from the following data : 


Items Price Weights 
Base Year Current Year 
Food 30 47 4 
Fuel 8 12 1 
Clothing 14 18 3 
House Rent 22 15 2 
Miscellaneous 25 30 1 
Solution. (Delhi Uni. B. Com., 1982) 


CALCULATIONS FOR COST OF LIVING 
INDEX NUMBER 


Prices 


Weights Base Current 
or) Year Year 


(р) 


Price 
Re latives 


Items 


[623] 


Food 
Fuel 


LI 
louse Rent i 
Miscellaneous 136:36 


ZWP—1418:74 


РАТНИ 
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um ZWP 141874 
Cost of L Ind — С 
iving Index Number IW Th 128 98 


_ Example 10.29. On a certain date the Ministry of Labour retail 

price index was 204.6. Percentage increases in pata Ae some base 

period were : Rent and Rates 65, Clothing 220, Fuel and Light 110, 

Miscellaneous 125. What was the percentage increase in the food 

£z ? Given that the weights of the different items in the group were 
follows : 


Food 60, Rent and Rates 16, Clothing 12, Fuel and Light 8, 
Miscellaneous 4. 


Solution. The current price index (I) for any commodity is 
obtained by adding 100 to the percentage increase in the price. Let 
us suppose that the percentage increase in the food group is x. 


COMPUTATION OF PRICE INDEX 
КЖ ШЫ Os ТОБА. d DIETE a pe EFE ШК EE E - 


Commodity 95 increase Current Weight WI 
in price index (1) (W) 

— а аА 
Food x 100+x 60 60(100-1-x) 
Rentand Rates 65 165 16 2640 
Clothing 220 320 12 3840 
Fuel and Light 110 210 8 1680 
Miscellaneous 125 225 4 900 

MM 

ZW-100 ZWI 
=60x+ 15060 


ИЕ AA D АЕА о Дур ә аша ПН эы тулеш). 
Price index is given by : 
zd . SWI 60x+ 15060 
Cost of Living Index— y5;-— 100 


But the index is given to be 2046 
60x4-15060 с 
ыг = уд. 

100 204°6 


> 602-+ 15060 —20460 


i 60x=20460—1 5060=5400 
5400 
zi ео 120 


Example 10.30. A textile worker in the city of Bombay earns 
Rs. 350 per month. The cost of living index for a particular month is . 
given as 136. Using the following data find out the amounts he spent 
on house rent and clothing. 
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Group Expenditure Group Index 
Food 140 180 
Clothing 2 150 
House rent ? 100 
Fuel and lighting 56 110 
Miscellaneous 


80 
(Bombay Uni., В. Com., 1975) 


. . Solution. Let the expenditure on house rent and fuel and 
lighting be Rs. x and Rs. y respectively. 


COMPUTATION OF COST OF LIVING INDEX 


UTE ——— ác —— 


Group Expenditure Group WI 
W) Index (1) 
Food 140 180 25200 
Clothing E 150 150x 
House rent y 100 100y 
Fuel and lighting 56 110 6160 
Miscellaneous 63 80 5040 
о2о toe ele de с ат ER aaa 
ZW-350 ZWI 
=х+у+259 =36400 4 150x-+ 100y 
n. So ы D MR 
Cost of living index is 
SWI 36400+150x+100y _ Given’ 
a 39 —— 136 (Given) 
E 36400-+ 150x+ 100y — 136 x 350--47600 
= 150x-+100y= 47600 —36400— 11200 (t) 
Also х+у+259=350 
A х+у=350—259=91 00) 
Multiplying (**) by 100, we get 
100x4-100y— 9100 (SM) 


Subtracting (***) from (*), we have : 
50z— 11200—9100— 2100 


> x——R-42 
Substituting in (**) we get : 
y=91—x=91—42=49 


Hence the worker spent Rs. 42 on clothing and Rs. 49 on 
house rent. 
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Example 10.31. The data below show the percentage increase 
in price of a few selected food items and the weights attached to each 
| of them. Calculate the index number for the food group. 


Fooditems Rice Wheat Dal Ghee Oil Spices Milk Fish 


j Weight 33 11 б зә ues 3 T 9 
Percentage increase 
in price 180 202 115 212 975 517 260 426 
Vegetables Refreshments 
9 10 
332 279 


Using the above food index and the information given below, 
calculate the cost of living index number. 
Group Food Clothing Fuel & Light Rent & Rates Miscellaneous 
Index — 310 220 150 300 


Weight 60 5 9 18 
[L.C.W.A. (Final) Jan. 1972 (0.5.)) 


) Solution. The current index number for each item is obtained 
on adding 100 to the percentage increase in price. 


CALCULATIONS FOR FOOD INDEX 


Neen enone ee AA A Tae 


j Food items Weight Percentage Current IW 
(W) increase Index (1) 
ee 
Rice 33 180 280 9,240 
Wheat 11 202 302 3,322 
Dal 8 115 215 1,720 
Ghee 5 212 312 1,560 
Oil 5 175 275 1,375 
Spices 3 517 617 1,851 
Milk 7 260 360 2,520 
Fish 9 426 526 4,734 
Vegetables 9 332 432 3,888 
Refreshments 10 219 379 3,790 
ee 
[ Total 100 - — 34,000 
| ae жЕ дш ee 


34000 
Index number for the food groupe LL =340 
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CALCULATIONS FOR COST OF LIVING INDEX 


Group Index (1) Weight (W1) Wl; 
Food 340 60 20,400 
Clothing 310 5 1,550 
Fuel and Light 220 8 фе 
Rent and Rates 150 9 1,3 
Miscellaneous 300 18 5,400 

Total 100 30,460 


Cost of Living Index - з ae —304:6 


Example 10.32. From the following data relating to working 
cus consumer price index of a city, calculate index numbers for 1972 
and 1973. 


Group Food Clothing ^ Fueland House  Miscel- 
Lighting Rent laneous 

Weights 48 18 7 13 14 

Group Indices 

1972 110 120 110 100 110 

Group Indices 

1973 130 125 120 100 135 


The wages were increased by 8% in 1973. Is this increase 
. sufficient ? 
[Delhi Uni., B. Com. (Hons.), 1975] 
Solution. 


COMPUTATION OF INDEX NUMBERS 
FOR 1972 AND 1973 
cL RM cu e Lit Xeno 


Group Weights ^ Group Indices Group Indice. WI; WI, 
9 1710) — 15) * 


et 


Food 48 110 130 5280 6240 
Clothing 18 120 125 2160 2250 
Fuel and Lighting 7 110 120 779 840 
House Rent 13 100 100 1300 1300 
Miscellaneous 14 110 135 1540 1890 
eee 
2W=100 ZW, IWI, 


; =11050 12520 
о кыже” eee ee ЦР О. 
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ZWI, 11050 
Inde: ber f: "2o p EE 
x number for 19 IW 100 110.5 
IWI; 12520 
Index number for 1973 = — = 
eic iud IW ТоС 


Hence increase in the consumer price number from 1972 to 
1973 is 


125.2—110.5= 14.7 
Hence the percentage increase in the price index for 1973 is 
14.7 


Hence an increase of 8% in the wages in 1973 is insufficient to 
maintain the same standard of living as in 1972. 


Example 10.33, An enquiry into the budgets of the middle 
class families of a certain city revealed that on an average the percen- 
tage expenses on the different groups were : 


Food 45, Rent 15, Clothing 12, Fuel and Light 8, 
Miscellaneous 20. 


The group index numbers for the current year as compared with 
a fixed base period were respectively 410, 150, 343, 248 and 285. 
Calculate the Cost of Living Index Number for the current year. 


Mr. X was getting Rs. 240 in the base period and Rs. 430 in the 
curYént year. State how much he ought to have received as extra 
allowance to maintain his former standard of living. 


[Madurai Uni. B. Com., Oct. 1981, April 1977 ; 
Osmania Uni. B. Com., April 1978] 


Solution. The percentage expenses on different groups may 
be segarded as the weights attached to them. 
Cost of living index is given by : 


IWI _ 45x410- 15x150--12 x 343 +8 X 2484-20 X 285 
SW ту 45+15+12+8+20 


This implies that ifa person was getting Rs. 100 in the base 
year then, in order that he is fully compensated for rise in prices, 
his salary in the current year should be Rs. 325. Hence if Mr. X was 
getting Rs. 240 in the base year, his salary in the current period 
should be 

325 


Rs. 100 x 240=Rs. 780, 
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in order to enable him to maintain the same standard of living w.r.t. 
rise in prices, other factors remaining constant. But current salary 
of Mr. X is given to be Rs. 430. Hence he ought to receive an extra 
allowance of Rs. 780—430—Rs. 350, to maintain the same stan- 
dard of living as in the base year. 


10.10. Limitations of Index Numbers. Although index 
numbers are very important tools for studying the economic and 
business activity of a country, they have their limitations and as 
such should be used and interpreted with caution. The following 
are some of their limitations : 


(1) Since index numbers are based on the sample data, they 
are only approximate indicators and may not exactly represent the 
changes in the relative level of a phenomenon. 


(2) There is likelihood of error being introduced at each stage 
of the construction of the index numbers, viz., 


(i) Selection of commodities. 
(ii) Selection of the base period. 


(iii) Collection of the data relating to prices and quantities of 
the commodities. 


i (iv) Choice of the formula—the system of weighting to be 
used. 


(у) The average to be used for obtaining the index for the 
composite group of commodities. 


As already pointed out, the selection of various commodities 
to be included for construction of the index and the selection of 
various markets or stores from where to coliect the data relating to 
prices and quantities of the commodities is not on the basis of a 
random sample because randomness will be at the cost of represent- 
ativeness but is done on the basis of a stratified-cum-deliberate or 
purposive sample. The commodities are usually classified into 
relatively homogeneous groups (or strata) and from within each 
group (or stratum) more important commodities are selected first E 
and from the remaining as many more commodities are selected at 
random consistent with resources at our disposal in terms of time 
and money. The deliberate or purposive sampling makes the 
sample subjective in nature and consequently some sort of per- 
sonal bias is likely to creep in and attempts should be made to 
minimise this error. 


(3) Due to dynamic pace of events and scientific advance- 
ments thesé days, there isa rapid change in the tastes, customs, 
fashions and consequently in the consumption patterns of the 
various commodities among the people in a society. Accordingly, 
index numbers (which require that the items and their qualities 
should remain same over period of time) may not be able to keep 
pace with the changes in the nature and quality of the commodities 
and hence may not be really representative one. 
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(4) There is no formula which measures the price change or 
quantity change of a given body of data with exactitude or per- 
fection. Accordingly, there is inherent in each index an error 
termed as ‘formula error’. For example, Laspeyre’s index has an 
upward bias while Paasche’s index has a downward error. 
measure of formula error is provided by the difference between 
these two indices. Moreover, index numbers are special type of 
averages and the type of average used for their construction has its 
own field of utility and limitations. Thus, the index numbers may 
not really be representative. 

(5) By suitable choice of the base year, commodities, price and 


quantity quotations, index numbers are liable to be manipulated 
by unscrupulous and selfish persons to obtain the desired results. 


In spite of all the above limitations, index numbers, if pro- 
perly constructed and not deliberately distorted are extremely use- 
ful ‘economic’ barometers. 

EXERCISE 10.5 
Н 1. (а) What is а cost of living index number ? What does it measure Ý 
Discuss briefly its uses and limitations. 
y (b) What do you understand by cost of living index numbers ? Describe 
briefly the various steps involved in their construction. 


ү (с) “Cost of living index pumber is essentially a consumer price index.” 
Discuss: State the important steps involved in its construction. What are its 
uses 


2. (а) What are the points that are taken into consideration in choosing 
the base and determining the weights in the preparation of cost of living index 


numbers 7 
[Osmania Uni. B.Com. (Hons.) Nov. 1981) 
(b) Givea detailed account of the method of construction of a Consu- 


mer Price Index. Interpret the formula you will use in this connection. 
[Lucknow Uni. B.Com., 1981] 


3. what is an Index Number ? Describe the general lines on which 
you would proceed to construct a cost of living index for factory workers inan 


industrial area. 
(Punjab Uni. B.Com., 1981) 


р 4. How does the method of construction of a consumer price index 
differ from that of the construction of a wholesale price index? Explain by 


taking an illustration. 
(Lucknow Uni. B.Com., 1982) 


5. Calculate cost of living Index Number from the following data : 


Group Index Weight 
A 360 48 
B 220 12 
c 230 9 
D 160 12 
E 190 15 


[Andhra Pradesh Uni. B. Com., April 1982] 
Ап 278.75 


ra РӘ IP 
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the construction of a certain Cost of Living Index Number, 
Тыла group p inder numbers were found. Calculate the Cost of Living 


1 odes Number by 
b (I) The weighted arithmetic mean ; and 
E. (1) The weighted geometric mean : 
F Group Index Number Weighte 
| Food 352 48 
í Fuel and Lighting 200 10 
Clothing 230 8 
] f House Rent 160 12 
r Miscellaneous 190 15 
E U. C. W. A. (Final), June 1979) 
E $ Ans, () 27426, (li) 2611 
A 1. Following information relating to workers in ап industrial 
town is given ;— 
Items of consumption Consumer Price Proportion of 
Index in 1970 expenditure on 
(1960= 100) the items 
(i) Food, drinks and tobacco 225 5296 
(ii) Clothing 175 8% 
(iii) Fuel and Lighting 155 10% 
(iv) Housing 250 14% 
(у) Miscellaneous 150 16% 


Average wage wa рея month in 1960 was Rs. 200, What should be the 
Ire cd per month in 1970 in that tows eo that the standard 
Re verbere dons nor eh AUTO ia tha е1? 


[Delhi de 2. A. Econ, (Hons.) 1975] 
Ans. Rs, 411 


8., A textile worker in the city of Ahmedabad earns Rs, 750 p.m, The 
cost living index for January, 1986, is given as 160. Using the following data 
find out the amounts he spends on (/) Food and (#1) Rent. 


Group Expenditure (Ёз.) Group Index 
(0 Food 1 190 
(i) Clothing 125 181 
(f) Rent 1 140 
(iv) Fuel & Lighting 100 118 
А (0) Miscellaneous 75 , 101 


[Delhi Uni. B.Com. (Hons. II), 1986] 


>ш (1) Rs. 300, (it) Rs. 150, 


4 fli 
; Food, Reni Rent 2, Clothing 24, ^ dte Марав Ti Ca коске 


the index number fora data when the increase in prices of the 
various items over prices of July, 1968= 100 92 oe respec- 


» 57, 90, 75 and 88 


“ee ee ee ee р 6-1. TUR) 
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10. In calculating а certain cost of living index number, the follow- 
ing weights were used. Food 15, Cothing 3; Rent 4, Fuel and light 2, Miscellane- 
ous 1. Calculate the index for a data n the average percentage increases in 
prices of items in the various groups over the base period were 32, 54, 47, 78 
and 58 respectively. 

Suppose a business executive was carning Rs. 2,050 in the base period, 
What should be his salary in the current period if his standard of living is to 


remain the same ? 
Ж (Punjab Uni. B.Com., April 1980) 
Ans. 141776 ; Rs. 2,906°08 


11. The following table gives the cost of living index numbers for 
different commodity groups together with respective ghts for 1984 (Base 


1961). 


Group Food Clothing Fuel and Lighting Rent Misc. 
Group Index 425 475 300 400 250 
Group Weight 62 4 6 12 16 


Obtain the overall cost of living index number. Suppose a person was 
earning Rs, 600 in 1961, what should be his salary in 1 if his standard of 
living in that year is to be the same as in 1961? 
[I.C.W. A. (Intermediate), December 1985) 
Ans. 388-5 ; Rs. 2,33100 


12. The relative importance of the. following eight коор of fami 


expenditure was found to be—food 348, rent 88, clothing 97, fuel and light 
^ goods 71, miscellaneous goods 35, services 79, drink and 


houschold durable 
tobacco 217. The corresponding increase in price for Oct. 1975 gave the 
following values—2$, 1, 22, 18, 14, 13, ? and 4. Calculate the per Y n 
in group— services, if the percentage increase for whole group is 157278. 
" [Bombay Uni. B.Com., 1975) 
Ans, 


13. From some given data, the retail price index based on five items, 
viz., Food, Rent and Rates, Fuel ‘and Light, Clothing and Miscellancous was 
calculated as 205, Percentage increases in prices over the base period are given 


below : 
Rent and Rates 60, Clothing 210, Fuel and Light 120, 


Miscellaneous 130. 
Calculate the percentage increase in the Food Group, given that the 


weights of different items arc as follows : 

Food 60, Rent and Rates 16, Fuel and Light 8, Clothing 12, Miscellane- 
ous 4, All items 100. 

Ans. 98 4% increase in food group. 


14, indices and the corresponditig weights for the working 
* diving index numbers inan orep ni ety Tor the years 1976 З : 


class cost of living ini 
1980 are given below : 
Group Index 
Group Weight 1976 1980 
Food т 370 380 
i 43 504 
469 336 
16 
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Compute the cost of living indices for the two years 1976 and 1980. If a 
worker was getting Rs. 300 per month in 1976, do you think that he should be 
Biven some extra allowance so that he can maintain his 1976 standard of 
living ? If so, what should be the minimum amount of this extra allowance ? 


[LC.W.A. (Final), June 1981] 
Ans. 353-20 ; 351-58. No extra allowance should be given. 


15. Labour and capital are used in two different Proportions in products 
A and B, but the price of each input is equal for both products. On the basis 
of the information given in the attached table, prepare, for the year 1980, 
Separate price indices for labour and capital. 


Product A Product B 
Weight for labour 60 70 
Weight for capital 40 30 
Cost of Production Index for 1980 
(Base year 1970=100) a 


[Delhi Uni. В.А. (Econ. Hons.), 1982] 
Ans. Pox (Labour)=300 ; P, (Capital) 400 


16. An enquiry into the budgets of the middle class families in a certain 
city in India gave the following information : 


Expenses on Food Fuel Clothing Rent Misc. 
b 35% 10%, 20% 15% 20% 
Price 1975 (Rs.) 150 25 75 30 me 
Price 1976 (Rs.) 145 23 65 30 4 
What is the cost of living index number of 1976 as compared with that of 
1975? [Osmania Uni. B. Com., (Hons.) Nov. 1981 ; 
1.C.W.A. (Final) June 1978 ; Kerala Uni., B. Com., April 1982] 
Ans. 102:86 


17. Use the formula = x 100, and find the consumer price index 
ө. 


for 1980 with 1969 as base with the help of the following data. Interpret the 
Index number so obtained. 


Item No. Сла core nice vacant Pr е pns 

(Go) (Po) (ра) 
1 75 34 9:6 
2 16 2:5 8:5 
3 15 76 12:6 
4 22 45 TS 
5 13 TO 11:0 
6 3 20 40 


Lucknow Uni., В. Com., 
Ma ae ( "ow Uni om., 1981) 


18. Construct the consumer price index numbers for 1979 and 
the indices given below :— е орн 


Year Food Rent Clothing Fuel Music 
1978 100 100 100 100 100 
1979 102 100 103 100 97 


1980 106 102 105 101 98 
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Assume the following weights for the different groups : 


Food Rent Clothing Fuel Music 
60 16 12 8 4 
(Himachal Pradesh Uni. M. Com., 1982) 
Ans. For 1979: 101°44 ; For 1980 ; 104-52 


19. If the Consumer Price Index (for the same class of people and with 
same base year) is higher for Delhi than that for Bombay, does it necessarily 
mean that Delhi is more expensive (for this class of people) than Bombay. Give 


reasons in support of your answer. 
[Delhi Uni. B. Com. (Hons.), 1978) 


20. The sub-group indices of the consumer price index number for 
urban non-manual employees of an industrial centre for a particular year (with 


base 1960=100) were : 
Food Clothing Fuel and Light House Rent Miscellaneous 
200 130 120 150 140 
The weights are 60, 8, 7, 10 and 15 respectively.. Itis proposed to fix 
dearness allowance in such a way as to compensate fully the rise in the price of 
food and house rent. 


What should be the dearness allowance expressed as a percentage of 
wage. 
8 [Delhi Uni. М. Com., 19701 


Ans. The consumer should be granted an increase of 65% in his base 
year salary. 
І 
21. The estimated per capita income for India in 1931-32 was Rs. 65. 


The estimate for 1972-73 was Rs. 650. In 1972-73, every Indian was, therefore, 


10 times more prosperous than in 1931-32”. Comment. 
(Delhi Uni. B.Com., 1978) 


22. What is an index number ? Describe the limitations of index num- 


bers. 
(Punjab Uni. B.Com. II, Sept. 1982) 


H 23. «Index numbers are used to measure the changes in some quantity 
which we cannot observe directly". 


Explain the above statement and point out the uses and limitations О! 


index numbers. 
(Guru Nanak Dev Unt. B.Com., Sept. 1970 


| | 


Time Series Analysis 
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11.1. Introduction. A time Series is a 
Statistical data in a chronological order, i.e., 
time of occurrence. It reflects the dynami 
a phenomenon over a period of time, 
to Economics, Business and Commerce, 
prices, production and consumption of various commodities ; 
agricultural and industrial production, national income and foreign 
exehange reserves ; investment, sales and Profits of busi 
bank deposits and bank clearings, pri 


n arrangement of 
in accordance with its 
с pace of movements of 

Most of the series relating 
e.g., the series relating to 


ij 4 r y of readings belong- 
ing to different time periods, of some economic variable or composite 
of variables”. 

Mathematically, a time series is defined by the functional 
relationship А 


where у is the value of the phenomen 
deration at time t. For example, (i) th 
Or a place in different years (7), (ii) the number of births and deaths 
(») in different months (t) of the year, (iii 


i „phenomenon or variable at 
times n, fo,..., fn are Jp Yos. Yn Tespectively, then the Series 


tin b ds E Ín 
У: Jz у E yn 
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constitutes a time series. Thus, a time series invariably gives a 
bivariate distribution, one of the two variables being time (t) and 
the other being the value (y) of the phenomenon at different points 
of time. The values of t may be given yearly, monthly, weekly, daily 
or even hourly, usually but not always at equal intervals of time. 
As already discussed in Chapter 4, the graph of a time series, known 
as Historigram, is obtained on plotting the data on a graph paper 
taking the independent variable 7 along the x-axis and the dependent 
variable y along the y-axis. 


11.2. Components of a Time Series. If the values of a 
phenomenon are observed at different periods of time, the values so 
obtained will show appreciable variations or changes. These 
fluctuations are due to the fact that the value of the phenomenon 
is affected not by a single factor but due to the cumulative effect 
of a multiplicity of factors pulling it up and down. However, if the 
various forces were in a state of equilibrium, then the time series 
will remain constant. For example, the sales (y) ofa product are 
influenced by (i) advertisement expenditure, (ii) the price of the 
product, (117) the income of the people, (iv) other competitive 
products in the market, (у) tastes, fashions, habits and customs of 
the peopie and so on. Similarly, the price of a particular product 
depends on its demand, various competitive products in the market, 
raw materials, transportation expenses, investment, and so on. The 
various forces affecting the values of a phenomenon in a time series 
may be broadly classified into the following four categories, com- 
moniy known as the components of a time series, some ОГ all of 
which are present (in a given time series) in varying degrees. 


(a) Secular Trend or Long-Term Movement (T). 


(b) Periodic Movements or Short-Term Fluctuations $ 
(i) Peasonal Variations (S), (ii) Cyclical Variations (С). 


(c) Random or Irregular Variations (К or I). 


The value (y) of a phenomenon observed at any time (f) is the 
net effect of the interaction of above components. We shall explain 
these components briefly in the following sections. | 

11.21. Secular Trend. The general tendency of the time 
series data to increase or decrease or stagnate during a long, period 
of time is called the secular trend or simple trend. This phenomenon 
is usually observed iñ most of the series relating to Economics and 
Business, e.g., an upward tendency is usually observed in time series 
relating to population, production and sales of products, prices, in 
come, money in circulation etc., whilea downward tendency is ko 
ed in the time series relating to deaths, epidemics, etc., due the ad- 
vancement in medical technology, improved medical facilities, better. 
sanitation, diet, etc. According to Simpson and Kafka J ; 

“Trend, also called secular or long-term trend, is the bare 
tendency of а series..-to grow or decline over а period of ү " e 
concept of trend does not include short.range oscillations, but rather 
the steady movement over a long time. 
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Remarks 1. It should be clearly understood that trend is the 
general, smooth, long-term average tendency. It is not necessary that 
the increase or decline should be in the same direction throughout 
the given period. It may be possible that different tendencies of 
increase, decrease or stability are observed in different sections of 
time. However, the overall tendency may be upward, downward or 
stable. Such tendencies are the result of the forces which are more 
or less constant for a long time or which change very gradually and 
continuously over a long period of time such as the change in the 
population, tastes, habits and customs of the people in asociety, and 
so on, They operate in an evolutionary manner and do not reflect 
sudden changes. For example, the effect of population increase over 
a long period of time on the expansion of various sectors like agri- 
culture, industry, education, textiles, etc., is a continuous but gradual 
process. Similarly, the growth or decline ina number of economic 
time series is the interaction of forces like advances in production 
technology, large-scale production, improved marketing management 
and business organisation, the invention and discovery of new 
natural resources and the exhaustion of the existing resources and so 
on—all of which are gradual processes. 

2. The term ‘long period of time’ is a relative term and can- 
not be defined exactly. It would very much depend on the nature 
of the data. In certain phenomenon, a period as small as few hours 
may be sufficiently long while in others even a period as long as 3-4 
years may not be sufficient. For example, to have an idea about 
the production of a particular product (agricultural or industrial 
production), an increase over the past 20 or 30 months will not 
Teflect a secular change for which we must have data for 7-8 years. 
In such a phenomenon, the values for short period (2-3 years) are 
unduly affected by cyclic variation (discussed later) and wil! not 
reveal the true trend. In order to have true picture of the trend, the 
time series values must be examined over a period covering at least 
two or three complete cycles. 

On the other hand, if we count the number of bacterial popu: 
lation (living organisms) of a culture subjected to strong germicide 
every 20 seconds for 1 hour, then the set of 180 readings showing а 
a gencral pattern would be termed as secular movement. 

. 3. Linear and Non-Linear (Curvi- Linear) Trend. If the time 
series values plotted on graph cluster more or less round a straight 
line, the trend exhibited by the time series is termed as Linear оїћет- 
wise Non-Linear (Curvi-Linear)—See Figures 11-1 and 11-2. Ina 
straighi line trend, the time series values increase or decrease more 
or less by a constant absolute amount, i.e., the rate of growth (or 
decline) is constant. Although, in practice, linear trend is commonly 
used, it is rately observed in economic and business data. In an 
economic and business phenomenon, the rate of growth or decline is 
not of constant nature throughout but varies considerably in different 
sectors of time. Usually, in the beginning, the growth is slow, then 
rapid which is further accelerated for quite some time after which 
* коше stationary or stable for some period and finally retards 
slowly. : 
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LINEAR TREND NON-LINEAR TREND 


Variable 
Variable 


M 
Years Years 
Fig. 111 Fig. 11:2 
4. It is not necessary that all the series must exhibit a rising 

or declining trend. Certain phenomena may giverise to time series 
whose values fluctuate round a constant reading which does not 
change with time, e.g., the series relating to temperature or baro- 
metric readings (pressure) of a particular place. 


5. Uses of Trend 


(i) The study of the data over a long period oftime enables 
us to have a general idea about the pattern of the behaviour of the 
phenomenon under consideration. This helps in business forecast- 
ing and planning future operations. For example if the time series 
data for a particular phenomenon exhibits a trend in a particular 
direction, then under the assumption that the same pattern will con- 
tinue in the near future, an assumption which is quite reasonable 
unless there are some fundamental and drastic changes in the forces 
affecting the phenomenon— we can forecast the values of the 
phenomenon for future also. The accuracy of the trend curve or 
trend equation or the estimates obtained from them will depend on 
the reliability of the type ef trend fitted to the given data. (For 
details, see Measurement of Trend—Least Square Method). The 
trend values are of paramount importance to a businessman in pro- 
viding him the rough estimates of the values of the phenomenon in 
the near future. For instance, an idea about the approximate sales 
or demand for a product is extremely useful to a businessman in 
planning future operations and formulating policies regarding in- 
ventory, production, etc. 


(ii) By isolating trend values from the given time series (By 
dividing the given time series values by the trend values or subtract- 
ing trend values from the given time series values—See [Models 
(1171) and 4112) discussed later], we can study the short-term and 


irregular movements. 


(iii) Trend analysis enables us to compare two or more time 
series over different periods of time and draw important conclusions 


abcut them. 
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112.2. Short-Term Variations. In addition to the long-term 
movements there are inherent in most of the time series à number 
of forces which repeat themselves periodically or almost periodi 
cally over a period of time and thus prevent the smooth flow of the 
values of the series in a particular direction. Such forces give rise 
to the so-called short-term variations which may be classified into 
the following two categories : 


(j) Seasonal Variations (S), and 
(ii) Cyclical Variations (C). 


Seasonal Variations. These variations in a time series are 
due to the rhythmic forces which operate in à regular and periodic 
canner over a span of less than a year, i.e., during a period of 12 
months and have the same or almost same pattern year after year. 
Thus seasonal variations in a time series will be there if the data 
are recorded quarterly (every three, months), monthly, weekly, daily, 
hourly, and so on. Although in each of the above cases, the ampli- 
tudes of the seasonal variations are different, all of them have the 
same period, viz., 1 year. Thus їп а time series data where only 
annual figures are given, there are no. seasonal variations. Most of 
economic time series are influenced by seasonal swings, e.g. prices, 
production and consumption of commodities ; sales and profits in 
a departmental store ; bank clearings and bank deposits, etc., are 
ali affected by seasonal variations. The seasonal variations may be 
attributed to the following two causes : 


(i) Those resulting from natural forces. As the name suggests, 
the various seasons or weather conditions and climatic changes play 
an important role in seasonal movements. For instance, the sales of 
umbrella pick up very fast in rainy season ; the demand for electric 
fans goes up in summer season ; the sales of ice and ice-cream in- 
creases very much in summer ; the sales of woollens go up in winter 
—-all being affecicd by natural forces, viz., weather or seasons. 
Likewise, the production of certain commodities such as sugar, 
tice, pulses, eggs, etc., depends on seasons. Similarly, the prices of 
agricultural commodities always go down at the time of harvest and 
then pick up gradually. 


(i) Those resulting from man-made conventions. These varia- 
tionsin a time series within a period of 12 months are due to 
habits, fashions, customs and conventions of the people in the 
society. For instance the sales of jewellery and ornaments go up in 

arriages ; the sales and profits in departmental stores go up con- 
siderably during marriages, and festivals like Diwali, Dushehra 
(Durga Pooja), Christmas, etc. Such variations operate in a regular 
spasmodic manner and recur year after year. 


The main objective of the measurement of seasonal variations 
is to isolate them from the trend and study their effects. A study 
of the seasonal patterns is extremely useful to businessmen, produ- 

- cers, sales-managers, efc., in planning future operations and in 


TAM 
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inventory control, persornel requirements. selling and advertising 
programmes. In the absence of any knowledge of seasonal varia- 
tions, a seasonal upswing may be mistaken as indicator of better 
business conditions while a seasonal slump may be mis-interpreted ' 
as deteriorating business conditions. Thus, to understand the be 
haviour of the phenomenon in a time series properly, the time 
series data must be adjusted for seasonal variations. [This is done 
by isolating them from trend and other components on dividing the 
given time series values (y) by the seasonal variations (S). See 
model (11.1).] This technique is called de-scasonalisation of data and 
is discussed in detail later (c.f. § 11.6.5). 


Cyclical Voriations (C). The oscillatory movements in a time 
series with period of oscillation greater than one year are termed as 
cyclical variations. These variations in a time series are due to ups 
and downs recurring after a period greater than one year. The 
1 fluctuations. though more ог less regular, are not neces- 
uniformly periodic, i.e., they may or may not follow exactly 
similar patterns after equal intervals of ume. One complete perio! 
which normally lasts from 7 to 9 years is termed 25 a ‘cycle’. These 
oscillatory movements in any business activity are the outcome of 
the so-called ‘Business Cycles’ which are the four-phased cycles 
comprising prosperity (boom), recession, depression and recovery 
from time to time. These booms and depressions in any business 
activity follow each cther with steady regularity and the complete 
cycle from the peak of one boom to the peak of next boom usually 
lasts from 7 to 9 years. Most of the economic and business series, 
e.g., series relating to production, prices, wages, investments, etc., 
are affected by cyclical upswings and downswings. Р 


formulation ef policy decisions regarding purchase, production, 


The study of cyclical variations is of great importance to busi- 
ness executives in the formulation of policies aimed at stabilising the 
leve! of business activity. A knowledge of the cyclic component 
enables a businessman to have an idea about the periodicity of the 
booms and depressions and accordingly he can take timely steps for 
maintaining stable market for his product. 


1L2:3. Random or Irregular Variations. Mixed up with 
cyclical and seasonal variations, there is inherent in every time 
series another factor called random or irregular variations. These 
fiuctuations are purely random and are the result of such unforeseen 
and unpredictable forces which operate in absolutely errgtic and 
irregular manner. Such variations do not exhibit any definite pat- 
tern and there is no regular period or time of their occurrence, 
hence they are named irregular variations. These powerful varia- 
tions are usually caused by numerous non-recurring factors like 
floods, famines, wars. earthquakes, strikes and lockouts, epidemics, 
revclution, etc., which behave in a very erratic and unpredictable 
manner. Normally, they are short-term variations but sometimes their 
effect is so intense that they may give rise to new cyclical or other 
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movements. Irregular variations are also known as episodic fluctua- 
tions and include all types of -variations in a time series, data which 
are not accounted for by trend, seasonal and cyclical variations. 


Because of their absolutely random character, it is not possible 
to isolate such variations and study them exclusively nor we can 
forecast or estimate them precisely. The best that can be done about 
such variations is to obtain their rough estimates (from past ex- 
perience) and accordingly make provisions for such abnormalities 
during normal times in business. 


11.3. Analysis of Time Series. The time series analysis con- 
sists of : 


(i) Identifving or determining the various forces or influences 
whose interaction produces the variations in the time series. 


(ii) Isolating, studyieg, analysing and measuring them inde- 
pendently, j.e., by holding other things constant. 


The time series analysis is of great importance not only to 
businessman or an economist but also to people working in various 
disciplines in natural, social and physical sciences. Some of its uses 
are enumerated below : 


(i) It enables us to study the past behaviour of the phenomenon 
under consideration, i.e., to determine the type and nature of the 
Variations in the data 


(ii) The segregation and study of the various components 1$ 
of paramount importance to a businessman in the planning of future 
operations and in the formulation of executive and policy decisions. 


(iii) it helps to compare the actual current performance or 
accomplishments with the expected ones (on the basis of the past 
- performances) and analyse the causes of such variations, if any. 


. (ir) It enables us to predict or estimate or forecast the be- 
haviour of the phenomenon in future which is very essential for 
business planning. 


(v) It helps us to compare the changes in. the values of diffe- 
rent phenomena at different times or places, etc. 


11.4. Mathematical Models for Time Series. The following 
are the two models commonly used for the decomposition of a time 
series into iis components. 


(i) Additive Model or Decomposition by Additive Hypothesis. 
According to the additive model, the time series can be expressed 
as: 


Y=T+S+C+1 sz 11:39 
or more precisely 
Yi;—T-ScECcO f -- (11.1 ө) 
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where Y (Y:) is the time series value at time t, and Tt, St, C and 
I, represent the trend, seasonal, cyclical and random variations at 
time t. In this model S—S:, C=C, and J=Ii are absolute quanti- 
ties which can take positive and negative values so that : 


ES-—Z5,—0, for any year, 
®С=®С‹=0, for any cycle, 
and У-У. =0, in the long-term period. 


The additive model assumes that gll the four components of 
the time series operate independently of each other so that none of 
these components has апу effect on the remaining three. This 
implies that the trend, however fast or slow it may be, has no effect 
on the seasonal and cyclical components ; nor do seasonal swings 
have any impact on cyclical variations and conversely. However, 
this assumption is not true in most of the economic and business 
time series where the four components of the time series are not 
independent of each other. For instance, the seasonal or cyclical 
variations may virtually be wiped off by very sharprising or declin- 
ing trend. Similarly strong and powerful seasonal swings may 
intensify or even precipitate a change in the cyclical fluctuations. 

(ii) Multiplicative Model or Decomposition by Multiplicative 
Hypothesis. Keeping the above points in view, most of the econo- 
mic and business time series are characterised by the following 
classical multiplicative model : 

Y-TXSxCxI, 


or more precisely 
Yi-Tx Sv xxn «ve (11:2a) 


This model assumes that the four components of the time series 
are due to different causes but they are not necessarily independent 
and they can affect each other. In this model 5, C and lare not 
viewed as absolute amounts but rather as relative variations. Except 
for the trend component Т, the other components S, Cand / are 
expressed as rates or indices fluctuating above or below 1 such that 
the geometric means of all the 5=5; values in a. year, С=С values 
in a cycle or /=/: values in a long-term period are unity. 


Taking logarithm of both sides in (11.2) we get : 
log Y=log T+log S--log C--log I (11.3) 


which is nothing but the additive model fitted to the logarithms of 
the given time series values. 
Remarks 1. Most of the time series relating to economic 


and business phenomena conform to the multiplicative model (11-2). 
In practice, additive model (11-1) is rarely used. 


2. Mixed Models. In addition to the additive and multi- 
plicative models discussed above, the components in a time series 


(11:2) 
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may be combined in a large number of other ways. The different 
models, defined under different assumptions, will yield different 
results. Some of the mized models resulting from different combin- 
ations of additive and multiplicative models are given below ; 


Y=TCS+1 (11.4) 
Y=TC+SI (11.5) 
Y=T+SCI (11.6) 
Y=T+S+CI х (11.7) 


3. The model (11.1) or (11.2) can be used to obtain a 
measure of one or more of the components by elimination, viz., sub- 
traction or division. For example, if trend component (7) is known 
then, using multiplicative model. it can be isolated from the given 
time series to give : 

7 _ Y _ Original Values 
EET Trend Values 


ү Thus for the annual data, for which the seasonal component S 
is not there, we have 


(11.8) 


Y=TXCxI 
F 
> ION fees .8 
CXI T (11.8 a) 


In the following sections we shall discuss various techniques for 
the measurement of different components of a time series. 


11.5. Measurement of Trend. The following are the four 
“methods which are generally used for the study and measurement of 
the trend component in a time series. 


(i) Graphic (or Free-hand Curve Fitting) Method. 
(ii) Method of Semi- Averages. 


(iii) Method of Curve Fitting by the Principle of Least Squares. 
(iv) Method of Moving Averages. 


11.5.1. Graphic or Free Hand Curve Fittiag Method. This is the 
simplest and the most flexible method of. estimating the secular trend 
and consists in first obtaining a historigram by plotting the time 
Series values on a graph paper and then drawing a free hand smooth 
curve through these points so that it accurately reflects the long- 
term tendency of the data. The smoothing of the curve eliminates 
the ather components, viz., seasonal, cyclical and random variations. 
In order to obtain proper trend line or curve, the following points 
may be borne in mind : 


(i) It should be smooth. 


(ii) The number of points above the trend curve/line should be 
more or less equal to the number of points below it. 


ipis ^e inii la tremens te | A taal otra eine roa baht айды] эс уг 
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(iii) The sum of the vertical deviations of the given points 
above the trend line should be approximately equal to the sum of 
vertical deviations of the points below the trend line so that the total 
positive deviations are more or less balanced against total negative 
deviations. 


(iv) The sum of the squares of the vertical deviations of the 
given points from the trend line/curve is minimum possible. 


[The points (iii) and (iv) conform to the principle of average 
(Arithmetic mean) because the algebraic sum of the deviations of 
the given observations from thcir arithmetic mean is zero and the 
sum of the squared deviations is minimum when taken about mean.] 


(v) If the cycles are present in the data then the trend line 
$hould be so drawn that : 


(a) It has equal number of cycles above and below it. 


(b) It bisects the cycles so that the areas of the cycles above 
and below the trend line are approximately same. 


(vi) The minor short-term fluctuations or abrupt and sudden 
variations may be ignored, 


Merits. (i) It is very simple and time-saving method and does 
not require any mathematical calculations. 


(ii) It is a very flexible method in the sense that it can be used 
to describe all types of trend—linear as well as non-linear, 


Demerits. (i) The strongest objection to this method is that 
it is highly subjective in nature. The trend curve so obtained will 
very much depend on the personal bias and judgement of the in- 
vestigator handling the data and consequently different persons will 
obtain different trend curves for the same set of data. Thus, a 
proper and judicious use of this method requires great skill and 
expertise on the part of the investigator and this very much restricts 
the popularity and utility of this method. This method, though 
simple and flexible, is seldom used in pratice because of the inherent 
bias of the investigator. 


(ii) It does not help to measure trend. 


(iii) Because of the subjective nature of the free hand trend 
curve, it will be dangerous to use it for forecasting or making 
predictions. 


11.5.2. Method of Semi-Averages. As compared with 
graphic method, this method has more objective approach. In this 
method, the whole time series data is classified into two equal parts 
w.r.t. time. For example, if we are given the time series values for 
10 years from 1965 to 1974 then the two equal parts will be the data 
corresponding to periods 1965 to 1969 and 1970 to 1974. However, 
in case of odd number of years, the two equal parts. are obtained 
on omitting the values for the middle period. Thus, for example, 
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for the data for 9 years from 1970 to 1978, the two parts will be the 
data for years 1970 to 1973 and 1975 to 1978, the value for the 
middle year, viz., 1974 being omitted. Having divided the given 
series into two equal parts, we next compute the arithmetic mean 
of time-series values for each half separately. These means are 
called semi-averages. Then these semi-averages are plotted as 
points against the middle point of the respective time periods 
covered by each part. The line joining these points gives the straight 
line trend fitting the given data. 


As an illustration, for the time series data for 1965 to 1974 
we have : 


Part U 


1970 to 1974 


Period : 


1965 to 1969 


Semi-Average : 261 уун 


з =з ауз 
Medica TEE 


Middle of time 1967 1972 


Period 


х1 is plotted against 1967 and 7, is plotted against 1972. The 
trend line is obtained on joining the points so obtained, viz., the 
points (1967, %,) and (1972, x.) by a straight line. In the above 
case the two parts consisted of an odd number of years, viz., 5 and 
hence the middle time period is computed easily. However, if the 
two halves consist of even numbers of years asin the next case given 
above; viz., the years 1970 to 1973 and 1975 to 1978, the centring 
of average time period is slightly difficult. In this case X, (the 
mean of the values for the years 1970 to 1973) will be plotted against 
the mean of the two middle years of the period 1970 to 1973, viz., 
the mean of the years 1971 and 1972. Similarly x, will be plotted 
against the mean of the years 1976 and 1977. 


Merits. (i) An obvious advantage of this method is its 
objectivity in the sense that it does not depend on personal judge- 
ment and everyone who uses this method gets the same trend line 
and hence the same trend values. 


(ii) It is easy to understand and apply as compared with the 
moving average or least square methods of measuring trend. 


(iii) The line can be extended both ways to obtain future 
or past estimates. 


Limitations. (i) This method assumes the presence of linear 
trend (in the time series values) which may not exist. 


(ii) The use of arithmetic mean (for obtaining semi-avetages) 
may also be questioned because of its limitations. 
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Accordingly, the trend values obtained by this method and the 
predicted values for future are not precise and reliable. 


Example 11.1. Apply the method of semi-averages for deter- 
mining trend of the following data and estimate the value for 1980 : 


Years Sales Years Sales 
(thousand units) (thousand units) 
1973 20 1976 30 
1974 24 1977 28 
1975 22 1978 32 


If the actual figure of sales for 1980 is 35,000 units, how do you 
account for the difference between the figures you obtain and the 
actual figures given to you 2 

(Punjab Uni. B.Com., Sept., 1980) 


Solution. Here n=6, and hence the two parts will be 1973 
to 1975 and 1976 to 1978. 


Year Sales 3-Yearly Semi-Ayerage 
(thousand units) Semi-Totals (4.M.) 

1973 20 66 

1974 E > 66 ——=22 

1975 22 3 

1976 30 

1977 2). 90 390 5 

1978 32 3 


Here the semi-average 22 is to be plotted against the mid- 
year of first part, i.e., 1974 and the semi-average 30 is to be plotted 
against the mid-year of second part, viz., 1977. The trend line is 
shown in the following diagram. 


34 
32 id 
2 10 
= Trend 
Э 28r line 
o 
2 26 A 
w 24 a / so Original 
2 n ‘ data 
g 22r y x 
Cou 7 
m У 
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Remark. The trend values for different years can be read 
from the trend line graph. Alternately, the average increment in 
value of sales (thousand units) for 3 years from 1974 to 1977 is 
30—22—8 (000 units). Hence the yearly increment in sales is 


$266 ("000 units). 


Now the trend value of sales for 1974 is the average of first 
part, viz., 22 ('000 units) and for 1977 is 30 ('000 units). Hence 
using the fact that the yearly increment in sales is 2.667 C000 units), 
the trend values for sales of various years can be obtained as shown 
below. 


COMPUTATION OF TREND VALUES 


Year Trend Values Year Trend Values 

(*000 units) (000 units) 
1973 22—2:667—19:333 1977 30 
1974 22 1978 30-4-2:667 — 32-667 
1975 224-2:667—24:667 1979 32:667 --2:667—35:334 
1976 24:667 --2:667—27:334 1980 35:334 4-2:667 —-38:001 


Thus the estimated (trend) value for sales in 1980 is 38,001 
units. This trend value differs from the given value of 35,000 units 
because it has. been obtained under the assumption that there is a 

` linear relationship between the given time series values which, in 
this case (as is obvious from the graph of the original data) is not 
true. Moreover, in computing the trend value the effects of seaso- 
nal, cyclical and irregular variations have been completely ignored 
while the observed values are affected by these factors. 


Example 11.2. From the following series of annual data find 
i ed line by the method of semi-averages. Also estimate the value 
or 1979. 


Year : 1970 1971 1972 1973 1974 
Actual Value ; 170 231 261 267 278 
Year : 1975 1976 1977 1978 


Actual Value : 302 299 298 340 


.,,Solution. Неге the number of years is 9, i.e., odd. The two 
middle parts will be 1970 to 1973 and 1975 to 1978, the value for 
middle year, viz., 1974 being ignored, - 


| 
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———————À 


Year Actual Value 4 Yearly Semi-Totals Semi-Average 
TM MEME E E USE ы Ыы amen 
Cds e 
1 

1972 261| > 929 -4 72322502232 
1973 267 
1974 278 
р r 

299 
1977 298] > 1239 —2—=309.75:310 
1978 340 


—————— 


The value 232 is plotted against the middle of the years 1971 and 
1972 and the value 310 is plotted against the middle of the years 
1976 and 1977. The trend line graph is shown below in fig. 11.4, 


Fig. 11°4 


From the graph we see that the estimated (trend) value for 
1979 is 348. 


Aliter Trend Value for 1979. From the calculations in the above 
table we observe that the increment in the actual value from middle 
of 1971-72 to the middle of 1976-77, i.e., for 5years is 310—232=78. 
Hence the yearly increment is 78/5. We also find that the average 
trend value for middle of 1976-77 is 310. Hence the trend value 
for 1979 is given by 


310+ $$ =310+39=349. 
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This value difiers from the graph value of 348 obtained from 
the trend line because we have obtained the calculations by round- 
ing the decimals. 


11.5.3 Method of Curve Fitting by the Principle of Least 
Squares. The principle of least squares provides us an analyti- 
talor mathematical device to obtain an objective fit to the trend 
ofthe given time series. Most ofthe data relating to economic 
and business time series conform to definite laws of growth or 
decay and accordingly in such asituation analytical trend fitting 
will be more reliable for forecasting and predictions. This technique 
can be used to fit linear as well as non-linear trends. 


Fitting of Linear Trend. Letthe straight line trend between 
the given time-series values (y) and time (1) be given by the equa- 
tion : 

y=a+bt + (11-9) 

_ . Then for any given time ‘f’, the estimated value ye of y as 
given by this equation is : 

ye=a+bt ...(11.10) 

As discussed in details in Chapter 9—Linear Regression 
Analysis, the principle of least squares consists in estimating the 


values of a and b in (11-9) so that the sum of the squares of errors 
of estimate 


E-—X(y—y)? —Z(y—a-— bt), (11.11) 
is minimum, the summation being taken over given values of the 


time series. This gives the normal equations or least square equat 
ions for estimating a and b as 


Zy=natb3t (11.12) 
Zty—aXt-4- bXt?, (11-13) 

where п is the number of time series pairs (1 
j Л »y) It may be seen 
that equation. (11:12) is obtained on taking sum of both sides in 
equation (11.9) Equation (11.13) is obtained on multiplying equa- 


tion (11.9) by ? and then summi i i 
EU Ко | mming both sides over the given values 


Solving (11.12) and (11.13) for a and b and substituting these 


due in (11-9), we finally get the equation of the straight line 


Remarks 1. The least square trend line is obtained so that : 
G) Z(y—-y))—0 
> Zy=Xye, 
i.e., the sum of the given values and the sum of trend values are equal 
and (ii) ZX(y—y9* is minimum, 
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where y is the observed time series value and уе is the correspond- 
ing trend value given by the trend line (11.9). 


2. The straight line trend implies that irrespective of the 
seasonal and cyclical swings and irregular fluctuations, the trend 
values increase or decrease by a constant absolute amount ‘b’ per 
unit of time. Thus, if we are given the yearly figures for a time. 
series, then the coefficient ‘b’ in the line (11-9), which is nothing 
but the slope of the trend line [c.f. equation of a line in the form : 
у=т«<-+} с}, gives the annual rate of growth. Hence the linear trend 
values form a series in arithmetic progression, the common difference 
being 'b', the slope of the trend line. 


After obtaining the trend line by the principle of least squares, 
the trend values for different years can be obtained on substituting 
the values of time 7 in the trend equation. However, from 
practcal point of view, a much more convenient method of obtain- 
ing the trend values of different years is to compute the trend value 
for the first year from the equation of the trend line and then add 
the value of ‘b’ to it successively (because the trend values form a 
series in А.Р, with common difference ‘b’). 


Fitting a Second Degree (Parabolic) Trend. Let the 
second degree parabolic trend be given by the equation : 


y=a+bt+ct? (11:14) 
Then for any given value of t, the trend value is given by : 
yea-t bt4-ct? 


Thus, if ye is the trend value corresponding to an observed 
value y, then according to the principle of least squares we have to 
obtain the values of a, b and c in (11-14) so that 

Е=У(у—у)? 
—X(y—a-—bt—ct*y 
is minimum for variations ina, b and c. Thus, the normal or least 
square equations for estimating a, b and c are given by: 
Zy-na-FbXt-- cZi* 
Xiy-azt t DERA ct (1115) 
Ity=a 214 bZt-4 cXt* 

tequation in (11.15) is obtained on summing both 
sides ane The second equation is obtained on multiplying 
(11.14) with т, [the coefficient of second éonstant b in (11.14)] and 
then summing both sides. The third equation is obtained on multi- 
lying both sides of (11.14) with 1? [the coefficient ofc, the third 

ТЫМО in (11.14)] and then summing over values of the series. 


iven time series, Ху, Угу, Bry, Zt, Ut, Zi? and 214 can 
be Beton equations (11.15) can be solved for a, b and c. 
With these values of a, b, c, the parabolic curve (11-14) is the trend 


curve of best fit. 


; 


616 Business Statistics 


Remark. Change of Origin. Usually, the values of t are for 
different years, say, 1970, 1971,..., 1979 and thus computation of Xr, 
27%, Ууз, Zrt, etc., and hence the solution of equations (11-12) and 
(11-13) for linear trend or equations (11.15) for parabolic trend is 


shift the origin in the time variable according to our convenience 
and assign it the consecutive values 0, 1, 2,...etc., the time period 
allotted the value 0 is known as the period of origin. This might 
slightly facilitate the solution of the normal equations. However, 
the algebraic computations can be simplified to a great extent by 
shifting the origin in time variable г toa new variable x in such a 
0. The technique is explained 


below and can be applied only if the values of. t are given to be equi- 
distant, say, at an interval h. 


If n, the number of time series values is odd, then the trans- 
formation 18: 


t—middle value 
Interval (4) puteo 


IE yearly figures for, say, 1970, 1971, 1972, 
then 


x= 


Thus, if we a 
+6. 1976, ie., n=7 


SAA mies year _ 


t—1973 Ы) 


Putting 1—1970, 1971, 1972,...,1976 in (*) we get x=—3, 
—2, —1, 0, 1, 2, and 3 respectively so that У = 2—0 


If n is even then the transformation is : 


= (Arithmetic mean of two middle values) 
" (Interval) (11.17) 


Thus, if we are given the yearly values for, say, 1965, 1966, 


1967, ..., 1972 then 


х= E t1960) =2t—3937 rn) 
Putting 1—1965, 1966,. 


-:1972 in (**) we get respectively : 
x=—7, —5, 


3; 1,173, 5, 7 so that Ex Zy3—0 


The transformations (*) or(**) will always give Zx-0-—Zx3, 
and this reduces the algebraic calculations for the solution of normal 
equations to a great extent. For example, for the linear trend 

Jy-—adbx, (11.18) 


where x is defined either by (11:16) or ( 11.17) according as n is odd 
or even, the normal equations for estimating a and b become : 


Zy—na-d-bXx and ZxyzaZx--bXxi 
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but Zz—0. Hence these equations give : 


Zy—na and — Zxy-bXx 
EE Уху 
> a= rz and = 525 (11.19) 


With these values of a and b, (11°18) gives the equation of 
the trend line. 


Similarly; for the parabolic trend 
y=a+bx+cx?, (11.20) 
the normal equations for estimating a, b and c are 
Zy=nat bXx4 сух? 
ZExy-aXZxd-bZx*-cxs 
®х?у=а®х?-ЕЬ®х%--с®а^ 
But Хх=0= 2х3. Hence these reduce to: 


ZXy-nadcEx + (i) 
Uxy=b=x* ii) 
Ух?у=аЎх?+сУл* «+ (iii) 


Equation (ii) gives the value of b= Блу andequations (i) and 
(iii) can be solved simultaneously for a and с. With these values 
of a, b, and c the curve (11-20) becomes the parabolic trend curve 
of best fit. 
Merits and Limitations of Trend Fitting by Principle of Least 


Squares 


Merits. The method of least squares is the most popular and 
widely used method of fitting mathematical functions to a given set 
of observations. It has the following advantanges : 


(i) Because of its analytical or mathematical character, this 
method completely eliminates the element of subjective judgement 
or personal bias on the part of the investigator. 


(ii) Unlike the method of moving averages (discussed in the 
next section— $ 11.5.4), this method enables us to compute the trend 
values for all the given time periods in the series. 


(iii) The trend equation can be used to estimate or predict the 
values of the variable for any period? in future or even in the 
intermediate periods of the given series and the forecasted values 
are also quite reliable. 

(iv) The curve fitting by the principle of least squares is the 
only technique which enables us to obtain the rate of growth per 
annum, for yearly data, if linear trend is fitted. If we fit the linear 
trend y—a--bx, where x is obtained from ¢ by change of origin such 
that Zx—0, then for the yearly data, the annual rate of growth is b 
or 2b according as the number of years is odd or even respectively. 
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Demerits. (i) The most serious limitation of the method is 
the determination of the type of the trend curve to be fitted, viz., 
whether we should fit a linear or a parabolic trend or some other 
more complicated trend curve. 


(ii) The addition of even a single new observation necessitates 
all the calculations to be done afresh which is not so in the case of 
moving average method. 


(iii) This method requires more calculations and is quite tedi- 
ous and time consuming as compared with other methods. It is 
rather difficult for a non-mathematical person (layman) to under- 
stand and use. 


(iv) Future predictions or forecasts based on this method are 
based only on the long-term variations, i.e., trend and completely 
ignore the cyclical, seasonal and irregular fluctuations. 


We shall now discuss some numerical examples to illustrate 
the technique of curve fitting by the principle of least squares. 


Example 11.3. Fit a trend line to the following data by the 
least squares method. 


Year 241975 1977 1979 1981 1983 
Production (in *000 tons): 18 21 23 27 16 
Estimate the production in 1980 and 1985. 
U.C.W.A. (Intermediate), December 1985] 
Solution. Let the trend line be given by the equation : 
y=a+bx 0%) 


where x=t— 1979, i.e., origin is at 1979 MU „ 
production (їп *000 tons). gr and x units—1 year and y is 


COMPUTATIONS FOR STRAIGHT LINE TREND 


es ee 


Year Production x-—1—197 
(t) (in 2000 tons) : а = COD, denn 
[62] y,-214-0:1x 
ce M—— — MT ERE 

jos 18 —4 —72 16 21—0:4—20:6 
s 21 -2 —42 4 21—0:2—20:8 
m = 0 0 0 21:0 
ae A 2 54 4 214-0:2-21:2 
4 64 16 214-04—21:4 

Total Zy=105 >х=0 Хху=4 Zx!—40 
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The normal equations for estimating a and b in (*) are given 


by: 
Zy=nat+b=x Хху=аўх+ЬУх? 
> MUR > 4=ax0+40b 
105 Кы, 
= a=- =21 > b= 0.1 


Substituting in (*), the trend line is given by the equation : 
ye=214+0.1x ж) 
Substituting х= —4, —2, 0, 2, 4 іп (**), we obtain the trend 
values for the years 1975 to 1983 respectively. The trend values 
are given in the last column of the above tabie. 
The estimated production in 1980 is obtained on putting 
x=t—1979=1980—1979=1, 
in(**). Thus 
(ye):9:0=21+0.1 X 1—214-0.1—21.1 (000 tons). 
The estimated production in 1985 is obtained on taking 
x—1—1979—1985— 1979 —6, 
in (**). Thus 
(ye)uss 721 4-0.1 x 6—214-0.6—21.6 ("000 tons). 
Fxample 11.4. Below are given the figures of production (in 


thousand tons) of a sugar factory : 
Year : 1969 1970 1971 1972 1973 1974 1975 
Production; 77 88 94 85 91 98 90 
(i) Fit a straight line by the method of ‘least squares’ and show 
the trend values. 
(ii) What is the monthly increase in production ? 


(iii) Eliminate the trend. 
[Delhi Uni. B. Com. (Hons.), 1976, 78] 


Solution. 
COMPUTATION OF STRAIGHT LINE TREND 

Year Production x=æt— 1972 xy = Trend Values 

(іп *000 tons) Ve 
1969 77 -3 —231 9 83 
1970 88 —2 —176 4 85 
1971 94 ES! — 94 1 87 
1972 85 0 0 0 89 
1973 91 1 91 1 91 
1974 98 2 196 4 9з 
1975 90 3 270 9 95 
Total ty=623 >х0 Ixy=56 Ъх%=28 Зу,=623 
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(i) Let the straight line trend be given by : 
y=atbx ..(*) 


where the origin is July 1972 and x unit=1 year. The normal 
equations for estimating а and b in (*) are : 


Zy—nactbXx and — Zxy-aXxd bx: 


2: Ху __Хху Ё E =0] 
= а= апа =з < Ix 
i 623 асори 
on а= —89 and 28 =2 
Hence the straight line trend is given by the equation : 
у=89+2х ntt) 


Putting x=—3, —2, —1, 0, 1, 2, 3 іп (**) we get the trend 
values for the years 1969 to 1975 respectively and are shown in the 
last column of the above table. It may be checked that ®у= у, as 
required by the principle of least squares. 


(ii) From (*) it is obvious that the trend values increase by a 
constant amount ‘b’ units every year. Thus the yearly increase in 
production is ‘b’ units, i.e., 2x 1000— 2000 tons. 


Hence the monthly increase in production 


= 259 = 166°67 tons. 

Assuming multiplicative model, the trend valuesare eliminated 
on dividing the given values (у) by the trend values (y). However, 
if we assume the additive model, the trend eliminated values are 
given by (y—y.). The resulting values contain short-term (cyclic) 
variations and irregular variations. Since the data are annual, the 
3easonal variations are absent. 


ELIMINATION OF TREND 


Trend eliminated values based on 


Additive Model 
(У-у, ) 


Multiplicative Model 
Oye) 


71--83—0:93 


1970 88-85—1:04 
1971 94-87—1:08 
1972 8589-0-96 
1973 91+91=1:00 
1974 98+93=1-05 


90—95——5 90+95=0°95 
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Example 115. The sales of а compan. in lakhs of rupees for 
the years 1964—1971 are given below : iai diss) 


Years: 1964 1965 1966 1967 1968 1969 1970 1971 
Sales: 550 560 555 585 540. 525 545 585 
(i) Find the linear trend equation. 


(ii) Estimate the sales for the year 1963. 
[Delhi U. B.A. (Econ. Hons. 1), 1984] 


Solution. Іа this case, since n, the numberof pairs is even, 
viz., 8, we shift the origin to the time which is the arithmetic mean 
of the two middle times, viz., 1967 and 1968 and we take : 


(эшне 


x= 7 Künterval) =2(t—1967'5) 


=2t—3935 (i) 


Thus taking : 
t=1967, we get x-3934—3935— —1 
1=1966, we get x=3932—3935=—3 


and so on. Let the linear trend equation between y and x be given 


by: 
y=atbx (йу 
COMPUTATIONS FOR LINEAR TREND 


Year Sales ы ху x Trend values 

@ о) у,=555°6340°21х 
1964 550 —7 —3850 49 555:63—1x0:21—554 16 
1965 560 —5 —2800 25 555:63—5 x0:21—554:58 
1966 555 —3 —1665 9 555:63—3 x 0:21—555:00 
1967 585 -1 — 585 1 555:63—1 x0:212555:42 


1 540 1 555:63--1x 0:21—555:84 


1968 540 

1969 525 3 1575 9 §55:63-+3 х0'21‹=556°26 
1970 545 5 2725 25 555°63 +5 x 0°21==556°68 
1971 585 7 4095 49 555:634-7 x 0:21—557:10 


Total Ху=4445 5х=0 Zxy-35 Ух? 
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The normal equations for estimating а and b in (ii) аге: 


Zy=na+blz Хху=аўх+ Ух? 

> 4445=8a+0 = 35=ах0-+ 168b 
_4445 _ - ANEEL 

= quc =555-63 ү b= 168 =0.21 


Substituting in (її), the straight line trend is given by the 
savanon: y=555.63+0.21x (iii) 

Putting х= —7, —5, —3, —1, 1, 3, 5 and 7 in (iii) we get the 
trend values of sales for the years 1964 to 1971 respectively. The 
trend values are shown in the last column of the above table. 


(ii) The estimated sales for 1963 are obtained on putting 
1—1963 in : 
x==2(1-— 1967.5) —2(1963—1967.5)—— 9 
Substituting in (iii), the estimated sales for 1963 are : 
(J«)1963 555.63 4-0.21 x (—9)2:555.63— 1.89— 553.74 lakhs Rs. 
Example 11:6. Fit an equation of the form Y —a--bX-- cX? to 
the data given below. 
X: 1 2 3 4 9: 
fp 25 28 33 39 46 
[Delhi Uni. B.A. (Econ. Hons.) 1982) 
Solution. Heren, the number of pairs is odd. Hence we 
take t—X— (Middle value)=X—3, e(t 
so that the valucs of / corresponding to X—1, 2,3, 4and 5 are 
—2, — 1, 0, 1 and 2 respectively. 
Let the second degree trend equation between Y and / be : 
Y=a+bt+ct* 
where t=X—3 wee (**) 
CALCULATIONS FOR SECOND DEGREE TREND 
Ксы o — ee _ 


x Y 1-X—3 o p A Y ny 
—— M ea E E META d La oo 
Y 1.725 =2 4 8 16 —50 100 
2 28 -1 1 ЕТ 1 98 28 
3 33 0 0 0 0 0 0 
4 39 1 1 1 1 39 39 
A e 2 4 S- je * 92 184 
Келини ыле eee 
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The normal equations for estimating a, b and c in (**) are : 


Zy—na bZt-4- cXt* 1 171-5 sti 
Угу=а +? p = Pee ue zu 

уџ2у=аў1?--ЬХ13--сУі* 3 351=10a+ 34c... (iii) 
(ii) = b= A =5.3 


Multiplying (i) by 2 and then subtracting from (iii) we get : 
351—2x 171=(10a+34c)—(10a+20c) 


er 14с=351—342=9 > c= 70 
Substituting in (i) we get 
171-10 171-64 164630) 


А аа 5 


Substituting the values of a, b and c in (**) we get the trend 


equation as : 


Y=32.92+ 5.31+ 0.641* « (iv) 
where t=X—3 
Hence the second degree trend equation of У оп X becomes: . 
Y=32.92-4-5.3(X—3)+0-64(X—3)* 

=32.924 5.3X—15.9+0.64(X?—6X+9) 

= (32.92—15.90+5.76)+ (5.30—3.84) X+ 0.64X* 

=22 78+ 1.46X 4- 0.64X* 
Remark. The trend values of Y for X—1, 2, 3, 4 and 5 can be 


computed as given below : 


COMPUTATION OF TREND VALUES 


Y t-X-3 p 5:3t 0:641* Trend Values 
Ү,= 32:924-5"314-0:6412 


MNT. Rd 


x 

157225 -2 
2 128 —1 
3 

4 39 1 
5 46 2 


4 —106 2:56 32:92--10:64-2:56—24:88 
1 —53 064 32:92— 5:34-0:64—28:26 
33 0 0 0 0 3292— 0 40 —3292 
1 53 0:64 3292+ 5:3--0764—38:86 
4 106 2:56 32:924-10:6--2:56—46:08 


If we compare the original values (Y)and the corresponding 


trend values (Ye), we observe thatthey are very close. Hence, we 
may conclude that the parabolic trend (iv) isa very good fit to the 


given data. 
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11.5.4 Conversion of Trend Equation. Any trend equation 
ye=f (t) eO) 
depends on three factors, viz., 
(i) The origin of time reference. 
(ii) The units of time, viz., yearly, monthly, weekly, etc. 
(iii) The units of the given values, i.e., the time series values 
relate to annual figures, monthly figures or monthly averages. 


The trend equation (*) may be recomputed after redefining 
these factors to suit our convenience. We shall discuss below two 
points : 

(1) Shifting of origin and 


(2) Conversion of annual trend equation to monthly trend 
equation when : 


(a) The y-values are in annual totals and 
(b) The y-values are given as monthly averages. 


Shifting of Origim. Quite often, to facilitate comparisons 
among trend values, it becomes desirable to shift the origin (the 
time period of reference) in a time series to some convenient point. 
We shall explain the technique by an example. 


Let the straight line annual trend equation be given by : 
ye=a+bx (11-2) 
where Origin : 1970 (Ist July) 
x units : one year 
y units : Annual Totals. 
The constant 'a'in the trend equation (11.21) is the trend 
values at the year of origin, viz., 1970, i.e., 
(Ye) =a 
Now, if we want to change the time series to have its origin in, 
say, 1975, i.e., we want to shift the new trend origin 5 years hence 
then the new trend equation is obtained on changing the value of 
x to x+5 in (11:21). Thus, the new trend equation becomes : 
ye a--b(x4- 5), ...(11. 22) 
Origin : 1975 (1st July), i.e., х=0 when t=1975. 
Similarly, if we want to shift the origin to 1967, i.e,, 3 years 
back, the new trend equation becomes: 
ye=a+b(x—3), ...(11.23) 
Origin : 1967 (1st July), i.e., 2=0 when t=1967. 


ү 


> 
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Thus shifting of origin only affects the value of the constant 
‘a’ in the equation (11.21) while the slope ‘b’ of the equation 
remains the same. 

Example 11.7. The parabolic trend equation for the sales 
(in °000 Rs.) of a company is given as: 

у:= 15.6—1.4x+0.9x? 

[Origin : 1973 (July І) ; x unit—1 year ; y unit=yearly sales.) 
Shift the origin to 1978. 

Solution. The required trend equation with origin shifted to 
1978, i.e. 5 years hence is obtained from the given equation on 
changing x to x+5 and is given Ьу: 

ye 15.6— 0.4(x4- 5) -0.9(x -- 5)? 
=15.6—(0.4x-+2)+0.9(x?+ 10x 4-25) 
—15.6—24-22.5—0.4x 9x - 0.92? 
—36.1--8.6z4-0.9x* (7009 Rs.) 

Origin : 1978 (July 1) 

Conversion of Annual Trend Equation to Monthly Trend 
Equation. Let us again consider the annual trend equation (11.21). 
The slope *b' in the equation represents the annual increment in the 
p-values. Since the average monthly figure is obtained on dividing 
the total annual figure by 12, the trend equation (11.21) converted 
to average monthly values becomes : 


a b 
у т уух 011.24) 
where, Origin : 1970 (1st July) 


x units : One year 
y units : Monthly figures. 

Forexample, we may say that average monthly production of 
sugar, say, for four years 1970, 1971, 1972 and 1973 are yy, Уз, Уз» 
y,respectively. Thus, the x unit is years, though we are given 
average monthly values. 

In equation (11.24) the coefficient of =, viz., 5/12 represents the 
increment in y-values on a monthly basis but from one month ina 
year to the corresponding month in the following (next) year. In 
order to obtain a monthly trend equation in which the x values are. 
also in units of one month, and as the coefficient of x represents an 
increment in trend values from month to month, the coefficient b[12 
in equation (11.34) has to be further divided by 12. Thus, the 
monthly trend equation becomes : 
e 2 © ...(11.35) 
Origin : 1970 (1st July) 

x units : One month 
y units: Average monthly values 


where, 


626 Business Statistics 


Thus if we want to shift the origin in (11.25) from Ist July to 
middle of November, i.e., four and half months hence, then equa- 
tion (11.25) reduces to : 


a b 
у= + тда 


Origin : 15th November, 1970 
x units: One month 
y units : Average monthly values. 


(z--4 5) (1126) 


Similarly, if the origin їп monthly trend equation (11. 25) is 
shifted to middle March, i.e., 34 months back, it reduces to : 


a b 
у= 15 Ttda (x—3.5) (11.27) 


Remark. The annual trend equation (11.31) can also be re- 
duced to quarterly trend equation which will be given by : 


a b 
ay wi ү 
ы yo +4 T (11.28) 


Origin : 1970 (1st July) 
x units : One quarter 
y units : Quarterly values. 
Example 11.8. The equation for yearly sales іп (000 Rs.) for 
a commodity with Ist July, 1971, as origin is Y-—81.64-28.8X. 
Determine the trend equation to give monthly trend values with 
15th Jan. 1972 as origin. 
[Delhi Uni. B.A. (Econ. Hons. I), 1985] 


Solution. The given annual trend equation reduced to 
monthly values becomes : 
_ 81-6 , 28.8 
yep tq 
=6:8+0.2 = eO) 


[Origin : (Ist July 1971 ; x unit=1 month ; 

y unit—average monthly sales (in '000 Rs.)] 

We want to shift the origin to’ January 1972, viz., middle of 
January, i e., 15th Jan., 1972. In other words we have to shift the 
origin 6$ months forward and the required equation is obtained on 
сыпыр x to x-F6.5 in (*). Hence the new trend equation is 
EE Yo=6.8+0.2 (х--6.5) 

=6.8+0.2 x-1.3 
З =8.1+40.2 x Ee) 
[Origin : 15th Jan., 1972 ; x unit=1 month ; 


y unit=average monthly sales (in ’000 Rs.)] 


ee 
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Example 11.9. Below are given the annual production (in 
thousand tons) of a fertiliser factory : 
Years = 1977 1978 1979 1980 1981 1982 1983 
Production А 70 73 90 91 95 98 100 
(i) Fit a straight line trend by the method of least squares 
and tabulate the trend values. 
(i) Convert your annual trend equation into а monthly trend 
equation. [Delhi Uni. B.Com. (Hons-) IH, 1985] 
Solution. Let the straight line trend equation be: 
y=a+bx ---(i) 


where y is annual production (in thousand tons) ; z—t— 1980, i.e., 
origin is at 1980 and x unit=1 year. 


COMPUTATION OF STRAIGHT LINE TREND 
Year Production Trend values 
(in "000 tons) x=1—1980 xy xt ("000 tons) 
(y) у, =88`42864-5°0357х 


— .———.——————— 


1977 70 -3 —210 9 78:3572—5°0357=73 3215 
1978 75 —2 —150 4 83-3929 —5°0357==78'3572 
1979 90 -1 -90 1 88:4286- 5:0357 —83:3929 
1980 91 0 0 0 88:4286 
1981 95 H 95; A 88:4286--5:0357 —93:4643 
1982 98 2 196 4 93:4643-4-5:0357 298:5000 
1983 100 3 300 9 98:5000 --5:0357 &103:5357 
Тыш = 269 deed Fe жезл сс сын 
b 


The least square normal equations for estimating a and 
in (i) are : 


Zy--na--bEx Уху= аЎх+6ЬЎх? 

> 619=7a+bx0 > 141=ax0+bx28 
619 ast 

2 а=519-88.4286 3 b= “pg 75057 


Substituting in (i), the trend line is given by : 
Ve —88.4286 4- 5.0357x (й) 


Substituting x=—3, —2, —1, 0, 1,2 and 3 in (ii) we obtain 
the trend values for the years from 1977 to 1983 respectively as 
given in the last column of the above table. 


(ii) The linear trend equation fitted to the given data is : 
y. 88-4286+ 5.0357 NO 
[Origin : 1980 ; x unit=1 year 
y unit—annual production (in 000 tons)] 
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The equation (*) reduced to monthly trend equation becomes: 


[c.f (11-25)] . 884286 , 5.0357 
pe tuu 
5 Ye=7-3691+0.0350x 


[Origin : 1980 ; x unit=1 month 


y unit=monthly production (in ’000 tons)] 


EXERCISE 11.1 


1. What is a time-series? What are its main components ? Give 
illustrations for each of them. (Bombay Uni. B.Com., 1975) 


2. Discuss briefly the importance of time serie: analysis in business 
and economics. What are the components of a time series ? Give an example 


shes component: (Bombay Uni. B.Com., Nov., 1982) 
3. (a) Distinguish between additive model and multiplicative model in 

the analysis of time series. 
[Delhi Uni. B.Com. Hons., 1983} 


(b) Give the addition and multiplication models of the time series 
equations and explain briefly the components of a time series. 


[C.A. (Intermediate) May 1983) 


4, Define trend. Enumerate the different methods of measuring 
secular trend in a given time series. БИР Uni. M.A. (Есоп.) 1983; 
С.А. Intermediate May 1978 , Delhi Uni. B.Com., (Hons.) 1974] 


5. Discuss the statistical procedure you would adopt in the analysis 
of time series and explain how you will isolate the secular trend. 
[Osmamia Uni. B.Com., Nov. 1981) 


6. (a) Distinguish between the seasonal component and trend com- 
Ponent of a time series 


[Delhi Uni. B.A. (Econ. Hons. II), 1983) 
(6) Distinguish between secular trend, seasonal variations and cyclical 
fluctuations. How would you measure secular trend in any given data ? 
(Guru Nanak Dev. Uni. B.Com, II, September 1972) 
X c) What are secular trend and cyclical, seasonal and irregular fluctua- 
tions ? Tibe the methods of isolation of trend. 
(Punjab Uni. B.Com. Ii, Sept. 1982) 


7. Explain trend fitting by the method o i- iscuss its 
relative merits and demerits. T Ipse Dico i 


Я Compute the trend values by the method of Semi-averages from the data 
given below : 


Year 1962 1963 1964 1965 1966 1967 1968 1969 
No. of sheep 
(in lakhs) 56 55 51 47 42 38 35 32 


Ans. Trend values (in lakhs) for the years 1962 to 1969 аге: 
59, 56, 50°5, 46:5, 41:5, 37, 32:5, 28. 


8. Fit a trend li А z к 
method : line from the following data by using semi-average 


Year _ : 1973 1974 1975 1976 1977 1978 
Profits (in'000 Rs.): 100 120 140 150. 130 200 


(Andhra Pradesh Uni. B. Com., April 1982) 
line. Ans. Joining the points (1974, 120) and (1977, 160) we get the trend 


| 
| 
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Я 9. Explain the principle of least p ; 
fitting ? What Hasse ast squares. How isit used in trend 
ile of least sa Ane AS relative merits and demerits of trend fitting by the princi- 


10. Thi i 
e following table shows 1 ing i 
а аы g sthenumber of salesmen working in a 


Year : 1970 1971 1972 1973 1974 
з of Salesmen : 28 38 46 40 56 
se the method of 1 i i i 
FP a duet ud E eius to fita straight line trend and estimate 
(Bombay Uni. B.Com. 1978, 1975) 
Ans. Trend values : 30, 35:8, 41:6, 47:4, 53:2 ; (у,)107=49. 
Trend line : y,—41:6--5:8x ; origin : 1972. 
И. From the following data, calculate trend by the method of least 
squares. 


Year : 1970 1971 1972 1973 1974 1975 1976 
Profits 
(7000 Rs.) : 300 700 600 800 900 700 1000 


(Delhi Uni. B.Com., 1985) 
Ans. Trend values (in '000 Rs.) аге: 457'16, 542:87, 628:58, 714-29, 
80000, 885:71, 971-42. 
Trend line : ye=714-29+-85°71x ; origin : 1973. 
12. Production figures of a sugar factory (in '000) are given below : 
Year : 1970 1971 1972 1973 1974 1975 1976 
Production : 12 10 14 11 13 15 16 
(a) Fit a straight line trend to the data. 
(b) Plot these figures on a graph and show the trend line. 
(c) Estimate the production for 1977, 1979 and 1980. 
[Osmania Uni. B.Com. (Hons.), April 1983 ; 
Himachal Pradesh Uni. M.Com., Feb. 1983] 
Ans. (i) y,—134075x ; origin : 1973. 
(ii) Estimated production in ('000) in 1977, 1979 and 1980 is 16, 
17:50 and 18:25. 
13. Fit a straight line trend to the following data and estimate tbe 
likely profit for the year 1984 : 
Year + 1977 1978 1979 1980 1981 1982 1983 


Profit (in lakhs 


of Rs.) 72 75 65 80 85 95 


[Delhi Unl. B. Com. (Hons.) II, 1984 ; 
Himachal Pradesh Uni. M.A. (Econ.) Feb. 1983] 
Ans. Ye276+4'86x (Origin : 1980 ; y : Rs. in lakhs) 
(Ye) 1982=85°72 (Lakhs of Rs.) ; (Ye) 18547795743 (lakhs of Rs.) 
14. Below are given the figures of production (їп thousahd quintals) of 
a sugar factory : 


Year : 1973 1974 1975 1976 1977 1978 1979 
Production (in '000 
quintals) : 80 90 92 ‚83 94 99 92 
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(i) Fita straight line trend to these figures by the method of least 
squares. 


[Delhi Uni. B.A. (Econ. Hons. I), 1983) 
(ii) Show the given data and the trend line on the graph paper. 
(iii) Estimate the production in 1982, 
[Himachal Pradesh Uni. M. Com. 1982 ; Karnataka Uni. B. Com. 1982] 
(iv) Find the slope of the straight line trend. 
T [Kurukshetra Uni. B. Com. Sept. 1981) 
(у) Do the figures show a rising trend or a falling trend ? 


(vi) What does the difference between the given figures and trend values 
indicate ? 


Ans. (i) ye=90-+2x ; Origin : 1976 (1st July). 
Trend Values (000 quintals) : 84, 86, 88, 90, 92, 94, 96 
(ili) (Ye) 191—102 thousand quintals ; 
(iv) Slope=2 (°000 quintals). 


(v) Rising trend ; since slope is positive. 


19. Fita straight line trend by least squar. thod t i 
below and estimate trend for 1983. zi IS d given 


Year : 1977 1978 1979 1980 1981 1982 
Sales (in '000 Rs.) : 10 12 15 16 18 19 


[Delhi Uni, B. Com. (Hons.) 1983] 
Ans. ye=1'5+0-914x 3 х=2(1— 1979-5). 


Trend values: (їп '000 Rs): 


10:430, 12-258, 14:086 


, 15:914, 17-742 19:510. 
(Ye) 1 21:398 ('000 Rs.). DM 


16. Fit a straight line trend equation by the method of least squares and 
estimate the value for 1969. 
Year :1960 1961 1962 1963 1964 1965 1966 1967 
Value : 380 400 650 720 690 600 870 930 


[C.A. (Intermediate), May 1979) 
Ans, y,—655-.35:83x ; x=2(t—1963'5), 


0581 Trend Values : 404-19, 475-85, 547-51, 619-17, 690-83, 762:49, 834-15, 


(Ye) 10в0= 1049-13 
17. Fita straight line trend by the method of least squares to the 
following data. Assuming that the same rat f eres 
be predicted earnings for the year 1972 T ES mesh жш 


Year 1963 1964 1965 1966 1967 1968 1969 1970 


Earnings 
(Lakh Rs) 38 40 65 72 69 60 87 95 
(Do not plot the trend values on a graph). 


[Delhi Uni. B. Com. (Hons.) 1972] 
Ans.  ye-65:75--3:67x 3 xe2(t—1966:5) 


shm=106:12 (lakh Rs.) 
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18. Convert the following annual trend equation for total sales of a 
company to a monthly trend equation : 


Ү=162+15:8 X 
(Origin : 1975 ; Scale : 1 unit of X—1 year). 
Forecast the sales for June, 1978 by the two equations, Compare your 


ешь [Delhi Uni. В.А. (Econ. Hons. II), 1983] 

Ans. y=13°5 +0°1097x 

Origin : 1975, x unit=1 month. 

рз y unit=Monthly Sales] 

19. Thetrend of the annualsalesof Bharat Aluminium Company is 
described by the following equation : 

y,—124-07 x (t) 
[Origin : 1970 ; x unit=1 year and 
y unit—annual production] 


Step the equation down to a month to month basis and shift the origin 
to 1st January 1970. 


(Delhi Uni. M.Com., 1972) 
Ans. yale x ; [Origin : 1st July 1970 ; x unit=1 month]. 
ye=0'712+-0 0048 x ; [Origin : 1st Jan., 1970]. 
20. Trend equation for yu sales (in '000 Rs.) fora commodity with 


year 1979 as origin 1з Y=81-6428-3X. Determine the trend equation to give 
monthly trend values with January 1980 as origin. 


[A.LM.A. (Dip. in Management), July 1981] 
Ans. ya=8'1+0:2 x ; Origin: Middle of January; x unite month ; 
у unit=Average monthly sales ('000 Rs.) 


.. 2. Fita parabolic curve of second degree to the data given below and 
estimate the value for 1979 and comment on it : 


Years : 1973 1974 1975 1976 1977 

Sales (in '000 Rs.) : 10 12 13 10 8 
U.C.W.4. (Final), Dec. 1981 ; Delhi Uni. M.Com. 1978) 

Ans. y,—12.314—0.6x —0.857x* 

Trend Values : 10.086, 12:057, 12:314, 10:857, 7-686 


$ (219157 —3:798 (thousand Rs.). Since the sales cannot be negative thc 
Given second degree parabolic curve is not a good fit to the given data. 


22. Calculate trend val 


for the following d. i 
йол, ues for the following data using a second degree 


Year : 1971 1972 1973 1974 1975 1976 1977 
yu. 7 10 17 28 43 62 85 
(Bombay Uni. B.Com., October 1981) 

Ans. . y=28+13x+2x* ; x —1—1974. 


23. Below is given the census Population of India, 1901—1971 : 


Year : 1901 1911 1921 1931 1941 1951 1961 1971 
Population : 2383 2520 5 н" у Е . “ 
ОСУ, 2512 2789 318:5 3610 439 1 5479 


(i) Plot the time series on a graph sheet. 
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(ii) Compute and plot a straight line trend. s 
(їй) Compute and plot a second degree ploynomial trend on the same 
graph. 
Ans. (И) y«335:86--20:66x ; x=(t—1936)/5 
Trend Values (in millions): 191:24, 232:56, 273-88, 31520, 35652, 
397:84, 439:16,48018 
(iii) | y293:0224-20:665x --2:04x* ; xe(t—1936)/5 
Trend Values (in millions): 248:33, 240/70, 249:39, 27440, 31573, 
373:38, 447°35, 537:64. 


11.5 Method of Moving Averages. Method of moving 
averages is a very simple and flexible method of measuring trend. 
Tt consists in obtaining a series of moving averages (arithmetic 
means) of successive overlapping groups or sections of the time 
series. The averaging process smoothens out fluctuations and the 
ups and downs in the given data. The moving average is characte- 
rised by a constant known as the period or extent of the moving 
average. Thus, the moving average of period ‘m’ is a series of 
successive averages (A.M.’s) of m overlapping values at a time, 
starting with Ist, 2nd, 3rd value and so on. Thus, for the time 
series values у;, Уз. Jas Yas Js»-- for different time periods, the moving 
average (М.А.) values of period ‘m’ are given Ьу: 


Ist MA m (note 
2nd MA e (seyn y 


3rd MA eL (ина | 
and so оп, 
We shall discuss two cases. 


Case (i). When Period is Odd. If the period ‘m’ of the moving 
average is odd, then the successive values of the moving averages 
are placed against the middle values of the corresponding time 
intervals. For example, if m—5, the first moving average value is 
placed against the middle period, i.e., 3rd, the second M.A. value is 
placed against the time period 4 and so on, 


Case (ii). When Period is Even. If the period ‘m’ of the M.A. 
is even, then there are two middle periods and the M.A. values are 
placed in between the two middle periods of the time intervals it 
covers. Obviously, in this case, the M.A. values will not coincide 
with a period of the given time series and an attempt is made to 
synchronise them with the original data by taking a two-period 
average of the moving averages and placing them in between the 
corresponding time periods. This technique is called centering and 
the corresponding moving average values are called centred moving 
averages. In particular if the period m=4, the first moving average 
value is placed against the middle of 2nd and 3rd time intervals ; 
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the second moving average value is placed in between 3rd and 4th 
time periods and so on. These values are given by : 


у, = Юу), | 
у= tet уз УУ»), 
Pamat yty ty) J 


andsoon. The centred moving averages are obtained on taking 
2-period M.A. of ӯ;, Ys, Ys and so on. Thus, 


First Centred М.А.={(ў,+Ў;) 
(у уа Yi Y FAO Y et ys] 
= ЩО ьа Ове 1 
= Му, 2а 258270 (11.20) 


(11.29) 


Similarly, 
Second М.А.=$Оз+2%+2у+2%+. yo, (11.302) 


and во оп. These centred moving averages are placed against the 
time periods 3,4,5,-.-and so on. 


Equation (11:40) may be regarded as a weighted average of 
Ya» Yas Yar Ya nd Y» the corresponding weights being 1, 2, 2, 2,1, 
Le., 
жуу иу t Ways t WY E -. 


Tæ Wi Was wtw tws 


where wy=We= 1 and wm=w=w™2. 


Similar interpretation can be given to (11 30a) 


From (11.30) and (11.302) we see that a centred moving 
average of period 4 is equivalent to a weighted moving average of 
period 5, the corresponding weights being 1, 2, 2, 2, 1. (For verifica- 
tion of this result, sec Example 11 15. 


The moving average values plotted against time give the trend 
curve. The basic problem in M.A. method is the determination of 
period *m' and this is discussed in Remark 3 below. 


Remarks. 1. Moving Average and Linear Trend. Jf the 
series data does not contain amy movements except the trend 


time 
which when plotted on agraph givesa straight line curve, then the 


moving average will reproduce the series. The following example will 
clarify the point. 
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3-Yearl, 5-Yearl; 7-Yearly 
did go M.A. 8 MA M.A. 
а Е с —————_ р 

$ 10 — — = 

2 14 14 -— = 

3 18 18 18 – 

4 22 22 22 22 

5 26 26 26 26 

6 30 30 30 30 

7 34 34 34 34 

8 38 38 38 38 

9 42 42 42 — 
10 46 46 — -— 
11 50 = — Ба 


Thus the trend values by the moving average of extent 3, 5, 7 
and so on coincide with the original series. 


2. Moving Average and Curvilinear Trend. If the data does 
not contain any oscillatory or irregular movements and has only general 
trend and the historigram (graph) of the time series gives а curve 
which is convex (concave) to the base, then the trend values computed 
by moving average method will give another curve parallel to the given 
curve but above (below) it. In other words, if there are no variations 
in the data except the trend which is curvilinear then the moving 
average values, when plotted, will exhibit the same curvilinear 
pattern but slightly away from the given historigram. Further, 
greater the period of the moving average, the farther will be trend 
curve from the original historigram. 1n other words, the difference 
between the trend values and the original values becomes larger as 
the period of the moving average increases. 


3. Period of Moving Average. The moving average will com- 
pletely eliminate the oscillatory movements if : 


(i) The period of the moving average is equal to or a multi- 
ple of the period of oscillatory movements provided they are regular 
in period or amplitudé, and 


(ii) The trend is linear or approximately so. 


6 Hence, to compute correct trend values by the method of 
‘Moving averages, the period or extent of the moving average should 
be same as the period of the cyclic movements in the series. However, 
if the period of moving average is less or more than the period of 
the cyclic movement then it (M.A.) will only reduce their effect. 


,. Quite often we come across time series data which do not ex- 
hibit regular cyclic movements and might reflect different cycles with 
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varying periods which may be determined on drawing the historis 
gram of the given time series and observing the time distances bet- 
ween various peaks. In such a situation, the period of the moving 
average is taken as the average period of the various cycles present 
in the data. [As an illustration, see Example 11.13.] 


4. Moving Average and Polynomial Trend 
economic and business time series the trend is e ate d = 
cordingly, if the trend is curvilinear, the moving average values will 
give a distorted picture of the trend. In such a case the correct 
trend values are obtained by taking a weighted moving average of 
the given values. For example, the weights for a moving average of 
extent 5 for a parabolic trend are given by : 


09) 


Thus the first moving average value for series уз, ys, ys,...1$ 
given by : 


ptum 
3521535' 13135913571. 203$ ) 


ss (oerte ree). 


Tt may be observed that : 


(i) the weights for the M.A. are symmetric about the middle 
value, and 


(i) the sum of weights is unity. 


5. Effect of Moving Average on Irregular Fluctuations. The 
moving average smoothens the ups and downs present in the original 
data and, therefore, reduces the intensity of irregular fluctuations to 
some extent. It can't eliminate them completely. However, greater 
the period of the moving average (up to a certain limit), the greater 
is the amount of reduction in their intensity. Thus, from point of 
view of reducing irregular variations, long-period moving average is 
recommended. However, we have pointed out in Remark 2, that 
greater the period of moving average, farther are tbe trend values 
from the original values. In other words, longer period of moving 
average is likely to give a distorted picture of the trend values. Ac- 
cordingly, as a compromise the period of moving average should 
neither be too large nor too small. The optimum period of the 
moving average is the one that coincides with or is a multiple of the 
period of the cycle in the time series as it would completely eliminate 
cyclical variations, reduce the irregular variations and, therefore, give 
the best possible values of the trend. 


We shall now discuss numerical problems to explain the tech- 
nique of obtaining trend values by moving average method. 
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Example 11.10. Calculate three yearly moving average for the 
following data. 
Year : 
1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 


у: 
242 250 252 249 253 255 251 257 260 265 262 
(Bombay Uni., B. Com., April 1983) 


Solution. 


COMPUTATION OF 3-YEARLY MOVING AVERAGE 


Year y 3-yearly 3-yearly moving 
moving totals averages 
(Trend values) 
[0] (2) [62] =(3)+3 
1950 242 — — 
1951 250 744 248:0 
1952 252 751 250:3 
1953 249 754 2513 
1954 253 757 2523 
1955 255 759 2530 
1956 251 163 2543 
1957 257° 768 256:0 
1958 260 782 260:7 
1959 265 787 262:3 
1960 262 — = 


Example 11.11. Calculate the trend values by the method of 
moving average, assuming a four-yearly cycle, from the following data 
relating to sugar production in India :— 


—————————— B——— 


Year Sugar Production Year Sugar Production 
(lakh tonnes) (lakh tonnes) 
е —_— 
1971 37.4 1977 48.4 
1972 31.1 1978 64.6 
1973 38.7 1979 58-4 
1974 39.5 1980 38.6 
1975 47.9 1981 51.4 
1976 42.6 1982 84.4 


[C.A. (Intermediate), November 1983] 


| 
| 


Time Series Analysis 637 


Solution. Since we are given that the data follows a four- 
yearly cycle, we shall compute the trend values by using moving 
average of period 4. 


COMPUTATION OF 4-YEARLY MOVING AVERAGE 
ee A ТЕНИ e Se 


Year Sugar 4-yearly 4-yearly 2-period Centred 
production moving moving moving total moving ave- 
(lakh tonnes) totals average of col. (4) rage (Trend 
values 
а) (2) O (0-204 (5) (6)=(5)+2 
ч i OTI NB odi ЗАА ro ne Lee eater аа. —° 
1971 3T4 
1972 311 
+ 1467 36°675 
1973 38-7 є 75:975 37:99 
1572 39:300 
1974 39:5 < 81:475 40°74 
+ 1687 42:175 
1975 4T9 4 86:755 43:39 
1784 44:600 
1976 42:6 95475 4774 
+ 203:5 50:875 Я 
1977 48-4 + 104375 52:19 
+ 214:0 53:500 
1978 64:6 + 106:000 53:00 
+ 2100 52:500 
1979 58:4 + 105:750 52:88 
2130 53:250 
1980 38'6 4 111:450 55°73 
+ 2328 58:200 
1981 514 
1982 844 


Example 11-12. Determine the period of the moving average 
for the following data and calculate moving averages for that 
period : 

Yon; 03601-57095 VISUS o 7 N cc peo. p OUR 
Value : 130 127 124 135 140 132 129 127 145 158 


Io 5124432 lee AS: 


153 146 145 164 170 
(Bombay Uni. B. Com. 1975) 


Solution. Since the peaks of the given data occur at the years 
1,5,10 and 15, the data clearly exhibits a regular cyclic move- 
ment with period 5. Hence, the period ofthe moving average for 
determining the trend values is also 5, viz., the period of the cyclic 

. variations. 
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COMPUTATION OF FIVE-YEARLY MOVING AVERAGE 


— 


| TET, OMM 
Year Value 5-yearly 5-yearly 
Moving totals Moving Average 
{т (Trend Values 
а) (2) (9 A= 
2 0 s = 
3 124 656 1312 
4 135 658 131-6 
5 140 660 132.0 
6 132 663 132°6 
7 129 673 1346 
8 127 691 1382 
9 145 712 1424 
10 158 729 1458 
11 153 741 149-4 
2 146 166 1532 
13 145 118 1556 
14 164 ы = 
15 170 = = 


Example 11.13. Find the trend of annual sales of a trading 
organisation by Moving Average Method. 


Year Annual Sales Year Annual Sales 
(Rs. іп *000) (Rs. in 000) 

1900 40 1910 42 

1901 42 1911 48 

1902 40 1912 46 

1903 44 1913 52 

1904 49 1914 58 

1905 46 1915 56 

1906 42 1916 51 

1907 44 1917 57 

1908 44 1918 54 

1909 50 1919 63 


(Use the most appropriate period of moving average.) 
Solati Wok [Delhi Uni. B. Com. (Hons.) 1977} 

р lution. € know that th i i 
moving average is the period of the edle, 1810 Ретіой for the 


he cyclic variations. The gi 
data does not reveal a regular cycle of any fixed period if we 


ue the data carefully we have the Peaks at the following 
Year 1901 1904 1909 1911 1914 1917 1919 
Peak Value 42 49 50 48 


58 57 63 
Ё EIU 
Period 3 5 2 3 


Thus the data exhibits 6 cycles with varying periods 3, 5, 2, 3, 
3 and 2 respectively. The appropriate period of the moving average 
is given by the arithmetic mean of periods of different cycles 
ве by the data. Hence the period of the moving average is 
given by : 


3 2 


1.L4.L^.L1.L1.L5 19 
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COMPUTATION OF 3-YEARLY MOVING AVERAGE 


Year Annual Sales 3-yearly 3-yearly 
(Rs. in '000) moving totals moving average 

а) (2) Ө) 0=6)-3 
1900 40 

1901 42 122 40°67 
1902 40 126 42:00 
1903 44 133 44:33 
1904 49 139 46:33 
1905 46 137 45:67 
1906 42 132 44:00 
1907 44 130 43:33 
1908 44 138 46°00 
1909 50 136 45:33 
1910 42 140 46:67 
1911 48 136 45:33 
1912 46 146 48:67 
1913 52 156 52:00 
1914 58 166 55°33 
1915 56 165 55:00 
1916 51 164 54:67 
1917 57 162 54:00 
1918 54 174 58:00 
1919 63 


——— Є 


Remark. The following diagram clearly exhibits the peaks 
along with periods of different cycles. 
Pd= Periods 
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Example 11.14. What is moving average ? What are its uses in 
analysis of time series ? ‘ 
Given the numbers 2, 6, 1, 5, 3, 7, Pea oe down the weighted 
d 3, the wel hts being 1 
moving average of perio. e wei, tic PU А dian, June 1986) 
Solution. The weighted moving average is obtained on divi- 
ding the weighted moving totals by the sum of the weights, viz., 
1+4+1=6. Thus, 


———— — LE * 
Weighted M.A.= zw ^ ^g (*) 
COMPUTATION OF WEIGHTED M.A. OF PERIOD 3 
Values (X) DER totals рб, M „А. of 
u) - (2) EGE 6 
SERRE ое EUROS TS a E A суз ол... 
2 
6 1х2+4х6+1х1=27 27+6=4'5 
1 1х6+4х14-1х 5215 15+6=2°5 
5 1*1+4x 54+1x3=24 24--6=4:0 
3 1x5+4x341x7=24 24+6=4:0 
1 1x3+4x7+1x2=33 33--6=5'5 


Example 11.15. For the following series of observations, 
verify that the 4-year centred moving average is equivalent to a 5-year 
weighted moving average with weights 1, 2, 2 2, 2, 1 respectively : 

Year: 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 


Annual PSC OMIA В SUR: ose eb wu Ed 3 
Sales (Rs. 000) 


[C.A. (Intermediate) Nov. 1971 ; I.C.W.A. (Final), Dec. 1976] 
Solution. 
TABLE A 
COMPUTATION OF 4-YEARLY MOVING AVERAGE 
E ыш ыы з шыга уы ота 


Annual sales .4-yearly movi 4-yearl. 2-point 4-yearly 
Year — (000 Rs.) totals hf MA. Nor moving 
=(3)+4 total of average 


col. (4) (centred) 


=(5)+2 
osse DEN 
(00) (2) @) (5 ©) (6) 
Dr E AE с ШШЕ ШЕ ыны ODDO CNN CUNEO NES 
1969 2 
1970 6 
+ 14 3:50 
1971 1 + 725 3°63 
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1972 5 + 775 3°88 
| + 16 400 
| 1973 3 + 8:25 413 
| Ра 17 4:25 
1974 7 = 875 438 
| - 18 4°50 
| 1975 2 = 925 463 
| + 19 475 
1976 6 - 9:75 488 
- 20 5:00 
1977 4 « 1025 5:13 
+ 21 5:25 
} 1978 8 
! 1979 3 
i As in the earlier example, the weighted average is obtained on 


dividing the weighted totals by the sum of the weights, j.e., by using 
the formula : 


Л Zwy 
Weighted M.A. X» 
where Zw—l42424241-—8 


TABLE B 
COMPUTATION OF 5-YEARLY WEIGHTED MOVING 
AVERAGE 


Sales 5-yearly weighted moving 5-yearly weighted 
Year (000 Rs.) totals moving average 
[62] =(3)+8 
а) (2) G) Фф 
1969 2 
1970 6 
1971 1 1х2+2 (6+145)+1x3=29 3:63 
1972 5 1x 6+2 (1++5-++3)+1х7‹=31 3:88 
1 1973 3 1x 142 (5+3+7)+1х 2233 4:13 
1974 7 1x542 (34+7+2)+1 x 6—35 438 
1975 2 1x342 (74-2--6)-1 x 437 4-63 
| 1976 6 1х74+2 (24-6--4)-1 x 8939 4:88 
1977 4 1х2+2 (6--4--8)-1 x 3241 5:13 
1978 8 
1979 3 


| From Tables A and В we see that the 4-yearly centred 
| moving average is equivalent to 5-yearly weighted moving average 
with weights 1, 2, 2, 2, 1 respectively. 
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Remark. This result is true, in general, for any time series, 
Here we have just verified the result for the given time series. 


Merits and Demerits of Moving Average Method 


Merits 1. This method does not require any mathematical 
complexities and is quite simple to understand and use as compared 
with the principle of least squares method. 


2. Unlike the ‘free hand curve’ method, this method does not 
involve any element of subjectivity since the choice of the period of 
moving average is determined by the oscillatory movements in the 


data and not by the personal judgement of the investigator. 
3. Unlike the method of trend fitting by principle of least 


squares, the moving average method is quite flexible in the sense 
that a few more observations may be added to the given data with- 
out affecting the trend values already obtained. The addition of 
some new observations will simply result in some more trend values 
at the end. 


4. The oscillatory movements can be completely eliminated 
by choosing the period of the M.A. equal to or multiple of the 
period of cyclic movement in the given series. [See Remark 3, page 
634] A proper choice of the period [See Remark 5, page 6351 
also reduces the irregular fluctuations to some extent. 


‚5. Inaddition to the measurement of trend, the method of 
moving averages 18 also used for measurement of seasonal, cyclical 
and irregular fluctuations. 


Limitations. 1. An obvious limitation of the moving average 
method is that we cannot obtain the trend values for all the given 
observations. We have to forego the trend values for some 

_ observations at both the extremes (i.e., in the beginning and at the 
end) depending on the period of the moving average. For example, 
for a moving average of period 5, 7 and 9 we lose the trend values 
for the first and last 2, 3 and 4 values respectively. 


2. Since the trend values obtained by moving average method 
cannot be expressed by any functional relationship, this method can- 
not be used for forecasting or predicting future values which is the 
main objective of trend analysis. 


3. The selection of the period of moving average is very 
important and is not easy to determine particularly when the time 
series does not exhibit cycles which are regular in period and ampli- 
tude. In such a case the moving average will not completely 
eliminate the oscillatory movements and consequently the moving 
average values will not represent a true picture of the general 
trend. [See Remark 3, page 634 for determining the period of M.A.] 
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4. In case of non-linear trend, which is generally the case in 
most of economic and business time series, the trend values given 
by the moving average method are biased and they lie either above 


or below the true sweep of the data. According to Waugh : 


“Tf the trend line is concave downwards (like the side of a 
bowl), the value ofthe moving average will always be too high, if 
the trend is concave upward (like the side of a derby pot), the 
value of the moving average will always be too low.” 


As already pointed out, [see Remark 4, page 635], in case of 
polynomial trend, appropriate trend values are obtained. by using 
a weighted moving average with suitable weights. 


Keeping in view the limitations, the moving average method 
is recommended under the following situations : 
(i) If trend is linear or approximately so. 


(ii) The oscillatory movements describing the given time 
series are regular both in period and amplitude. 


(iii) If forecasting is not required. 


EXERCISE 11.2 


1. (d) Explain the method of moving average. How is it used in 
measuring trend in the analysis of a time series ? 


(b) Explain how trend is obtained by the method of moving averages 
in the analysis ofa time series. What are the merits and demerits of the 
method ? [Delhi Uni. B.Com., (Hons.), 1973 ; Bombay Uni. B.Com., Oct. 1974] 


(c) State the conditions under which a moving average can be recom- 
mended for trend analysis. How will you determine the period of the moving 
average ? 

2. (a) What is Time Series ? Mention its chief components. What is a 


moving average ? What are its uses in Time Series ? 
U.C.W.A. (Final), June 1983) 


(b) Explain how trend is eliminated from a time series by the moving 
average method. Use a suitable illustration. 
(Guru Nanak Dey Uni. B.Com. II, April 1983). 


3. (a) How are the Moving Average (M.A.) values affected if the 
period of M.A. is increased ? 


What is the effect of increase of the period of M.A. on the irregular 
fluctuations ? 


b) What are the limitations and advantages of the moving average 
Түшү trend fitting ? [Delhi Uni. B.Com. (Hons.) 1973] 


4, Explain briefly the various methods of determining trend in a time 
series, 
$ Using three-year moving averages, determine the trend and short-term 
fluctuations. Plot the original and trend values on the same graph paper. 
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Remark. This result is true, in general, for any time series, 
Here we have just verified the result for the given time series. 


Merits and Demerits of Moving Average Method 


Merits 1. This method does not require any mathematical 
complexities and is quite simple to understand and use as compared 
with the principle of least squares method, 


2. Unlike the ‘free hand curve’ method, this method does not 
involve any element of subjectivity since the choice of the period of 
moving average is determined by the oscillatory movements in the 
data and not by the personal judgement of the investigator. 


. 8. Unlike the method of trend fitting by principle of least 
squares, the moving average method is quite flexible in the sense 
that a few more observations may be added to the given data with- 
out affecting the trend values already obtained. The addition of 
some new observations will simply result in some more trend values 
at the end. 


4. The oscillatory movements can be completely eliminated 
by choosing the period of the M.A. equal to or multiple of the 
period of cyclic movement in the given series, [See Remark 3, page 
634] A proper choice of the period [See Remark 5, page 6351 
also reduces the irregular fluctuations to some extent. 


: 5. Inaddition to the measurement of trend, the method of 
moving averages is also used for measurement of seasonal, cyclical 
and irregular fluctuations. 


Limitations. 1. An obvious limitation of the moving average 
method is that we cannot obtain the trend values for all the given 
observations. We have to forego the trend values for some 
observations at both the extremes (i.e., in the beginning and at the 
end) depending on the period of the moving average. For example, 
fora moving average of period 5, 7 and 9 we lose the trend values 
for the first and last 2, 3 and 4 values respectively. 


2. Since the trend values obtained by moving average method 
cannot be expressed by any functional relationship, this method can- 
not be used for forecasting or predicting future values which is the 
main objective of trend analysis. 


3. The selection of the period of moving average is very 
importantand is not easy to determine particularly when the time 
series does not exhibit cycles which are regular in period and ampli- 
tude. In such a case the moving average will not completely 
eliminate the oscillatory movements and consequently the moving 
average values will not represent a true picture of the general 
trend. [See Remark 3, page 634 for determining the period of M.A.] 


Time Series Analysis — 643 


4. In case of non-linear trend, which is generally the case in 
most of economic and business time series, the trend values given 
by the moving average method are biased and they lie either above 
or below the true sweep of the data. According to Waugh : 


“If the trend line is concave downwards (like the side of a 
bowl), the value of the moving average will always be too high, if 
the trend is concave upward (like the side of a derby pot), the 
value of the moving average will always be too low.” 


As already pointed out, [see Remark 4, page 635], in case of 
polynomial trend, appropriate trend values are obtained. by using 
a weighted moving average with suitable weights, 


Keeping in view the limitations, the moving average method 
is recommended under the following situations : 
(i) If trend is linear or approximately so. 


(ii) The oscillatory movements describing the given time 
series are regular both in period and amplitude. 


(iii) If forecasting is not required. 


EXERCISE 11.2 


1. (a) Explain the method of moving average. How is it used in 
measuring trend in the analysis of a time series ? 


(b) Explain how trend is obtained by the method of moving averages 
in the analysis ofa time series. What are the merits and demerits of the 
method? [Delhi Uni. B.Com., (Hons.), 1973 ; Bombay Uni. B.Com., Oct. 1974] 


(c) State the conditions under which a moving average can be recom- 
mended for trend analysis. How will you determine the period of the moving 
average ? 


2. (a) Whatis Time Series? Mention its chief components. What is a 
moving average ? What are its uses in Time Series ? 
[1.C.W.A. (Final), June 1983] 


(b) Explain how trend is eliminated from a time series by the moving 
average method. Use a suitable illustration. 


(Guru Nanak Dev Uni. B.Com. 11, April 1983) 


3. (а) How are the Moving Average (M.A.) values affected if the 
period of M.A. is increased ? 


What is the effect of increase of the period of M.A. on the irregular 
fluctuations ? 


(6) What are the limitations and advantages of the moving average 
method of trend fitting ? [Delhi Uni. B.Com. (Hons.) 1973] 


4, Explain briefly the various methods of determining trend in a time 
series. 


Using three-year moving averages, determine the trend and short-term 
fluctuations. Plot the original and trend values on the same graph paper. 
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Year Production Year Production 
(in "000 tons) (in "000 tons) 

1968 21 1973 2 

1969 22 1974 25 

1970 23 1975 26 

1971 25 1976 21 

1972 24 1977 26 


[Delhi Uni. B.Com., (Hons.), 1982] 
Ans. Trend values: 22, 23:3, 24, 237, 2377, 243, 26, 263. 
Using additive model, short-term fluctuations are : 
0,—0:3, 1:0, 0°3,—1°7, 0:7, 0, 07 


5. Calculate the long-term trend and short-term oscillations with a 
three-year period from the following data. 


rm 


Year Output of tea Year Output of tea 
(tonnes) (tonnes) 
с ——————— 
1969 1632 1973 2620 
1970 1557 1974 3120 
1971 1652 1975 3236 
1972 2100 1976 3562 


(Kerala Uni. B. Com., April 1978) 
Ans. Trend values : 1613:67, 1769:67, 2124-00, 2613:33, 2992:00, 3306:00 
Short-term oscillations (assuming multiplicative model) : 
96:49, 93:35, 98°87, 12026, 104:28, 97-88 


6., From the following data calculate the 4-yearly moving average and 
determine the trend values. Find the short-term fluctuations. Plot the original 
data and the trend on a graph. 


Year Value Year Value 
1958 50:0 1963 38-1 
1959 36:5 1964 32:6 
1960 43:0 1965 417 
1961 44:5 1966 411 
1962 38:9 1967 338 


[C.A. (Intermediate), May 1980] 
Ans. Trend values: 42:1, 40:9, 39:8, 38:2, 38:1, 37:8 
Short-term oscillations (assuming multiplicative model) : 
102-14, 108:80, 97-74, 99-74, 85:56, 110732 


7. Assuming a four-yearly cycle, calculate the trend by method of mov- 
ing averages for the following data : 
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Year ‚_ _ Sales Year Sales 
(in Lakhs of Rs.) (in Lakhs of Ёз.) 
1961 500 1966 540 
1962 520 1967 560 
1963 550 1968 570 
1964 470 1969 590 
1965 510 1970 610 


(Andhra Pradesh Uni. В. Com., May 1978) 
Ans, 511-25, 515-00, 518-75, 532-50, 555-00, 573-75 
8. The following data give daily sales ofa shop observing a five-day 


week, over four successive weeks. Determine the Period of the moving average 
and calculate the moving averages accordingly. 


Day е 28 ИСЕ 2 
Sales 26 29 35 47 51 26 32 47 46 53 
ОБ ОВА А ris! 46) 2175 ув 19 28 


Sales 21 30 36 46 54 28 31 36 46 54 
[C.A. (Intermediate), Nov. 1974] 


Ans. Since the peak sales аге on Sth, 10th, 15th and 20th day, the data 
clearly exhibits a cycle of period 5. Hence the period of M.A. will be taken 


as5. $day M.A. (trend) values are : 


37°6 (3rd day), 37:6, 38:2, 38:6, 
38:8, 39:0, 39*0, 39:0, 39 0 (18th day). 


9. Find the trend for the following series using a three-year weighted 
moving average with weights 1, 2, 1. 


Year : 1 2 3 4 5 6 7 
Value : 2 4 5 7 


38:4, 38:8, 39:2, 38:8, 38:6. 38:6, 38:6, 


8 10 13 
U.C. W.4. (Final), June 1978] 
Ans. 3°75, 5:25, 6°75, 8'25, 10:25 
10. For the following series of observations, 


centred moving average is equivalent to a 7-year weighted 
weights 1, 2, 2, 2, 2, 2, 1 respectively, 


verify that the 6-year 
Moving average with 


Year 1970 197 1972 1973 1974 1975 
Sales (in '0000) : 2 4 3 6 7 9 
Year 1976 1977 1978 1979 1980 
Sales (in *0000) : 4 6 7 8 10 
U.C. W.A. (Intermediate), June 1985] 
11.6. Measurement of Seasonal Variations. As already 
pointed out, by seasonal variations in a time series we mean the 


variations due to such forces which operate in a regular periodic 
manner with period less than one year. The study of such variations 
which are predominantly exhibited by most of.the economic and 
business time series, is of paramount importance toa businessm RR 
or sales manager for planning future production and in scheduling 
purchases, inventory control, personnel requirements, and selling 
and advertising programmes. The objectives for studying seasonal 
patterns in a time series may be be classified as follows : 
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(i) To isolate the seasonal variations, i.e., to determine the 
‘effect of seasonal swings on the value of a given phenomenon, and 


(ii) To eliminate them, i.e.. to determine the value of the 
phenomenon if there were no seasonal ups and downs in the series. 
This is called de-seasonalising the given data and is necessary for 
the study of cyclic variations. 


Obviously, for the study of seasonal variations, the time series 
- data must be given for * 


weekly, daily or hourly. The study of seasonal variations assumes 
that the seasonal pattern is su 


(i) Method of ‘Simple Averages’, 
(ii) ‘Ratio to Trend method. 
(iii) ‘Ratio to Moving Average’ method. 
(iv) ‘Link Relative method. 
1161. Method of Simple Averages, Thi 
method of measuring seasonal variations in a time series and 


involves the following steps. (We shall explain the steps for monthly 


data. They can be modified accordingly for quarterly, weekly or 
daily data). 


s is the simplest 


(i) Arrange the data by years and months, 


(ii) Compute the average (Arithmetic Mean) X, for ith 
month ; i=1, 2,..., 12. Thus Xy Xs, ---, X; are the avreage values for 
January, February,..., December res 


, tespectively, the average being 
taken over different years, say, k in number. 


(iii) Obtain the overall average X of these averages obtained in 
step (ii). This is given by: 
' EE аа, „(11 31) 


(iv) Seasonal indices for different Months are obtained on 


expressing each mo: 


е nthly average as а percentage of the overall 
average, x i.e 


Seasonal Index for any month 


= Monthly Average X10 — ..(1135 


| 
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Thus, Seasonal Index for January 2 x 100 


Seasonal Index for February— X х100 


Seasonal Index for December= = X100 


Remarks. 1. If we are given quarterly data for different years 
then we compute average value ©з, (i=1, 2, 3, 4), for each quarter 
over different years and then 


(ананан) 0139) 


Finally, seasonal index numbers for different quarters are 
given by the formula : е 


Seasonal Index for ith quarter= A x 100 (11.34) 


. 2, The sum of the seasonal indices must be 1200 for monthly 
data and 400 for quarterly data. 

3. From computational point of view, a somewhat conve- 
nient formula for computing the seasonal index is obtained on 
substituting the value of x in (11 32). Thus we get : 

Ut Monthly Average 
Seasonal Index for any month Gee x 100 
Monthly Average x 1200 
Sum of monthly averages 


.+(11. 35) 
Quarterly Average x 400 


Sum of quarterly averages 


Similarly we shall have : 
Seasonal Index for any quarter— 


A more simplified formula is as follows ; *-(11.36) 


If Ts is the total for ith season, [i=1, 2, e 12 for monthly 
data], over the given k different years then : 


Seasonal Index for ith Season 2.— x 100 
Tilk 
Tk 


So instead of seasonal means we may use seasonal totals. 


Ti 
x 100——7. х 100 --(11,36a) 


Limitations. The method of simple averages, though very 
simple to apply gives only approximate estimates of the pattern of 
seasonal variations in the series. It assumes that the data do not 
contain any trend and cyclical fluctuations at all or their effect on 
the time series values is not quite significant. This is а very serious 
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limitation since most of the economic and business time series 
exhibit definite trends and are affected to a great extent by cycles. 
Accordingly, the indices obtained by this method do not truly 
represent the seasonal swings in the data because they include the 
influence of trend and cyclic variations also. This method tries to 
eliminate the random or irregular component by averaging the 
monthly (or quarterly) values over different years. In order to 
arrive at any meaningful seasonal indices, first of all trend effects 
should be eliminated from the given values. This is done in the 
next two methods, viz., ‘ratio to trend’ method and ‘ratio to moving 
average’ method. 

Example 11.16. Use the method of monthly averages to deters 

mine the monthly indices for the following data of production of a 
commodity for the years 1979, 1980, 1981 :— 


Month 1979 1980 1981 
(Production in lakhs of tonnes) 
January 12 s- 15 16 
February 11 14 15 
March I0 13 14 
April 14 16 16 
May 15 16 15 
June 15 H A 
July 16 1 
August 13 12 13 
September 11 13 10 
October 10 12 10 
November i n 1 1 
D e 
E : ЇС.А. (Intermediate), May 1982) 


COMPUTATION OF SEASONAL INDICES 


Month Production in lakhs of. tonnes Seasonal 
1 1980 1981 Total Indices 
0) (6) 


979 8. 
zi i à 
12 15 


( 
16 


January 104-88 
February 11 14 15 97:56 
March 10 13 14 90-24 
April 112-20 
Мау 112:20 
June 114-63 
July 119-51 

August 92:68 
September 
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October 
November 


December 


Average 


Aliter. Instead of using the totals (over different years) 
for different months, we could use the average values for different 
months. For example, the average production for ith month is 
given by: 


Xi L 2122152751012) 
where T: is the total production for ith month. 


For example : 
z= Ra 1433 : no =13.33 
and so on. Now proceed as in Example 11.17. 


Example 11.17. Compute the seasonal index for the following 
data assuming that there is no need to adjust the data for the trend. 


Quarter : 1970 1971 1972 1973 1974 1975 

I dat 8:5 5173.5 40 41 4.2 

П 3:9 41 3.9 4.6 4.4 4.6 

Til 3.4 ST. 3.8 4.2 4.3 
IV 3.6 4.8 0 4.5 


4. 4.5 4.7 
[Delhi Uni. B.A. Econ. (Hons.), 1978] 


Solution. Since we are given that there is по need to adjus! 
the data for trend, the appropriate method for computing the seaso- 
nal indices is “‘simp/e average’ method. 


COMPUTATION OF SEASONAL INDICES 


Year IQrt П Qrt III Qrt IV Qrt 
1970 35 3:9 34 36 
1971 35 41 37 48 
1972 3:5 3:9 37 40 
1973 40 46 3-8 45 


1974 41 44 42 45 
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1975 42 46 43 47 
элн уш Ap qn DRE IMMER с ей Ха I o ЖАНАРДЫ 
Total 228 255 23-1 261 
Бы MC ML Metu GP CE E EAS secs CR NNNM 
Average (A.M.) 3'8 425 3:85 435 

Seasonal Mies x 100 ARX 100 283 x 100 235, 100 
T =93°60 —104:68 —94:83 2107-14 


cu о ои АМОСОВА SONG TERR TE ы т. 
The average of the averages is : 


2. 3-804+4.254+3.8544.35 16.25 
fe 4 54 


11.6.2. Ratio to Trend Method. This method is an improves 
ment over the simple average’ method of measuring seasonality and 
is based on the assumption that the seasonal fluctuations for any 
Season (month, for monthly data and quarter, for quarterly data) 
are a constant factor of the trend. The following are the steps for 
Measuring seasonal indices by this method. 


(i) Compute the trend values (monthly or quarterly as the 
case may be), by the principle of least squares by fitting an appro- 
priate mathematical curve (straight line, second degree parabolic 
curve or exponential curve, etc.) 


4.06 


‚ (i) Assuming multiplicative model of time series, the trend 
is eliminated by dividing the given time series values for each season 
(month or quarter) by the corresponding trend values and multiply- 
ing by 100. Thus 


Trend eliminated values = T x100 (11.37) 
TSCI 
T х 100 : 
=SCIXx 100 -- (11.372) 


_These percentages will, therefore, include seasonal, cyclical 
and irregular fluctuations. Further steps are more or less same as 
in the ‘simple average’ method. 


(iii) Arrange these trend eliminated values according to years 
and months or quarters. An attempt is made to eliminate the cy- 
clical and irregular variations by averaging the percentages for diffe- 
rent months or quarters, over the given years. Arithmetic mean 
or median may be used for averaging. These averages give the 
preliminary indices of seasonal variations for different seasons 


_ (months or quarters). 


| 
; 
: 
1 


EEA 
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(iy) Lastly, these seasonal indices are adjusted to a total of 
1200 for monthly data or 400 for quarterly data by multiplying 
each of them with a constant factor k given by 
d 1200 pus 400 
Sum of monthly indices n Sum of quarterly indices 


k 


for monthly or quarterly data respectively. This step amounts to 
expressing the preliminary seasonal indices as a percentage of their 
arithmetic mean. 

Merits and Demerits. Since this method determines the in- 
dices of seasonal variations after eliminating the trend component, 
it definitely gives more representative values of seasonal swings as 
compared with the ‘simple average’ method. However, the averag- 
ing process over different years will not completely eliminate the 
cyclical effects particularly, if the cyclical swings are obvious and 
pronounced in the given series. Accordingly, the indices of seasonal 
variations obtained by this method are mingled with cyclical effects 
also and are, therefore, biased and not truly representative. Hence, 
this method is recommended if the cyclical movements are either 
absent or if present, their effect is not so. significant. If the data 
exhibits pronounced cyclical swings, then the seasonal indices based 
on ‘Ratio to Moving Average’ method, discussed in § 11.6.3 will 
reflect the seasonal variations better than this method. However, as 
compared with moving average method a distinct advantage of this 
metbod is that trend values can be obtained for each month (quar- 

ter) for which data are available where as there is loss of informa- 
tion of certain trend values (in the beginning and at the end) in the 
ratio to moving average method. 

Remark. If we are given the monthly (or quarterly) figures 
for different years, then the fitting of trend equation to monthly 
(quarterly) data which involves a fairly large number of observations, 
by the principle of least squares is quite tedious and time consum- 
ing. In such a situation, the calculations are simplified toa great 
extent by first fitting the trend equation to annual totals or average 
monthly or quarterly values and then adjusting or modifying it to 
monthly or quarterly values as explained in equations (11-35) and 
(11.38) § 11.5.4. This technique is explained in the following Ex- 
ample 11.22. 

Example 11-18. Using ‘Ratio to Trend’ method, determine the 
quarterly seasonal indices for the following data. 


Production of Coal (in Million of Tons) 


Years 1 Qrt. II Ort. III Qrt. IV От. 
1 68 60 61 63 
2 70 58 56 60 
3 68 63 68 67 
4 65 56 56 62 
5 60 55 55 58 
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Бошоп. COMPUTATION OF LINEAR TREND 


—————— — 


Year Yearly Quarterly Trend Values 
Totals Averages — xet—3 xt xy (Million tons) 
(t) (60) уе=61:4—Г46х 
ПКО О О м ______ 
1 252 63-0 -2 4 —126 64:30 
2 244 610 -1 1 — 61 62:85 
3 266 66:5 0 0 0 6140 
4 242 60:5 1 1 60:5 59-95 
5 224 56:0 0 4 112 58:50 


Sess 1 117 0 CC PD NNNM 


Zy=307 Ух=0 52.10 Хху=—14:5 


Let the straight line trend equation be : 

y=atbx ; sa(*) 
Origin : 3rd year ; x units : 1 year, 
and y units : Average quarterly production (in Million tons). 


e) The normal (least-square) equations for estimating a and b in 
are: 


Zy-na4bXx апа Уху=аЎх+ bux? 
Since — Zx—0, these give : 


Zxy ИКЕ ОШ 
b= ху = te = 1.45 
Hence the straight line trend is given by the equation : 
ye=61.4— 1.45 х E Gd] 


Origin : 3rd year ; x unit=1 year ; 
y unit : Average quarterly production. 


Putting x=—2, —1,0,1, 2 we obtain the average quarterly 
trend values for the years 1 to 5 respectively, which are given in the 
last column of the above table. 


From the trend equation (**), we observe that : 
Yearly increment in the trend values=bh==— 1.45 


= Quarterly increment= р” =-0.36 


The negative value of b implies that we have a declining trend. 
Now we have to determine the quarterly trend values for each year. 
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The average quarterly trend value for the Ist i 
_ The ear is 64.30. 
ae is, in fact, the trend value for the middle шп. i.e., half of 
es e Mon and half or ard quarter, for the first year. Since the 
ement is —0-36, we obtain the trend 
2nd and 3rd quarters of first year as: емы E 


64.30 —3(—0.36) and 64.304-1(—0.36) 
ie., 64-30+0.18 and 64.30—0.18 
i.e., 64.48 and 64.12 


respectively. The trend value for 1st quarter, now becomes 64.48 +- 
0.36—64.84. Since the quarterly increment is —0-36, the trend 
values for the 4th quarter of Ist year and remaining quarters of 
other years are obtained on subtracting 0.36 from the value of 3rd 
quarter, viz., 64.12 successively. Trend values are given in the 
following table. 


COMPUTATION OF SEASONAL INDICES 


Trend Eliminared Values 
(Given values as 96 of Trend values) 
I Qrt WOrt — Ill Qrt IV Ort 


Trend Values 


Year IQrt ПО III Ort IV Qrt 


104°87 93:05 95:13 98:81 
11043 9202 89:36 96:29 
109-78  102:31 11107  110:09 
107.44 98-10 93:68 10434 
101:59 — 93:70 87:42 100.03 


64°84 6448 6412 63-76 
63:39 6303 62:67 62°61 

s 61:58 61:22 60°86 
60:50 60-14 5978 5942 
59-06 58:70 5834 57:98 


URNA 
o 
№ 
= 


479:18 


Total 


Average (A.M.) Seasonal Indices 


Adjusted Seasonal Indices | 106-85 


Sum of indices 106.82 4- 95.84-95.33 4-101.91— 399.90 


Since this is not exacily 400, the seasonal indices obtained as 
arithmetic mean are adjusted toa total 400 by multiplying each of 
them with a constant factor, called correction factor : 

400 
— 1.00025 
dnte TU RM 

Remarks. 1. Since the sum of seasonal indices is 399'9 
which is approximately 400, we may not apply any adjustment in 
this case. 

2. Rounding to whole numbers, the quarterly seasonal indices 
are 107, 96, 95, 102 respectively. 

3. In obtaining the trend values, we fitted a linear trend equa- 
tion to average quarterly production. However, we could have 
fitted a straight line trend to annual (total) values and then, finally 
adjusted the trend equation to quarterly values [c.f. (11.28)]. . 
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11.63. “Ratio to Moving Average’ Method. This is an im- 
provement over the ‘Ratio to trend’ method as it tries to eliminate 
the cyclical variations which are mixed up with seasonal indices in 
the ‘Ratio to Trend’ method. ‘Ratio to Moving Average’ is the most 
widely used method of measuring seasonal fluctuations and involves 
the following steps : 

(i) Obtain centred 12-month (4-quarter) moving average 
values for the given series. Since the variations recur after a span of 
12 months for monthly data (4-quarters for quarterly data), а 12- 
month (4-quarter) moving average will completely eliminate the 

+ Seasonal variations provided they are of constant pattern and in- 
tensity. Accordingly, the 12-month (4-quarter) moving average 
values may be regarded to contain trend and cyclic components, viz. 


TXC, asaveraging process tries to eliminate the irregular com- 
ponent, 


(ii) Express the original values as a percentage of centred 
moving average values for all months (quarters) exceptfor the first 
6 months (2 quarters) and 6 months (2 quarters) at. the end. Using 
multiplicative model of time series, these percentages give : 


Original value AS CH. 
M.A. value Х100=—ус- x 100 
-—SIx 100 -« (11.38) 


Hence the 'ratio {о moving average’ represents the seasonal 
and irregular components. 


(iii) As in the ‘simple averages’ and ‘ratio to trend’ methods, 
arrange these percentages according to years and months (quarters). 
Preliminary seasonal indices are obtained on eliminating the irregu- 
lar component by averaging these percentages for each month 
(quarter), the average being taken over different years., Arithmetic 
mean or median may be used for averaging. 


(iv) The sum of these indices should be 1200 (or 400) for 
monthly (or quarterly) data. If it is not So, then these seasonal 
indices obtained in step (iii) are adjusted to a total of 1200 (or 400) 
by multiplying each of them with a constant factor 

ee 1200 
Sum of monthly indices 
or k= erc гуш 
~ Sum of quarterly indices 


for monthly and quarterly data respectively. This last step amounts 


to expressing each of the preliminary indices as a percentage of their 
arithmetic mean. 


E 
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provided the cyclical fluctuations are regular in periodicity as well 
as amplitude, An obvious drawback of this method is that there is 
loss of some trend values in the beginning and at the end and accor- 
dingly seasonal indices for first six months {or 2 quarters) of the 
first year and last six months (or 2 quarters) of the last year cannot 
be determined. 


Remarks. 1. Specific Seasonal Index and Typical Seasonal 
Index. The seasonal indices for each month (quarter) of different 
years are also known as specific seasonals and the average of 
specific seasonals for each month (quarter) fora number of years 
are termed as typical seasonals. 

2. Additive Model. If we use additive model of the time 
series, then the method of moving averages for computing seasonal 
indices involves the following steps. [We shallstate the steps for 
monthly data and these can be modified accordingly for quarterly 
and other data.] 

(i) Obtain 12-month moving average values. These will con- 
tain trend and cyclic components, i.e., they will represent (T+C). 

(ii) Trend eliminated values are obtained on subtracting these 
moving average values from the given time series values to give : 

y—M.A. values-(T-- S--C4-I)— (T C) 8-1 (11-49) 

(iii) Irregular component is eliminated on averaging these 
(S+1) values for each month over different years and we get the 
preliminary indices for each month, 

(iv) Sum of these indices should be zero. In case it is not so, 
the preliminary indices obtained in step (iii) are adjusted to a total 
of zero by subtracting from each of them a constant factor, 


kat [sum of the monthly seasonal indices ] 


‘Ratio to Moving Average’ method, using the multiplicative 
model is illustrated in Examples 11. 19 

Example 11-19. Calculate seasonal indices by the ‘ratio to mov- 
ing average method’ from the following data : 
Year I Quarter II Quarter ШІ Quarter IV Quarter 


1971 68 61 61 63 
1972 65 58 66 61 
1973 68 63 67 
[Delhi Uni. B. Com. (Hons.) 1975] 
Solution. 
Year Values  4-Quarterly 4-Qrt. — 2-Pd 4-Qrt Ratio to 
D) Moving .A. МЛ. M.A. M.A. Values 
Totals of col. (4) Centred 100 x 
а) (2) (9) (5 ©) (6) (= col.(2) 
col.(6) 


1971 IQrt. 68 
П Qrt. 


62 
$ к JE EA 
TQrt. 61 ; 3125 9663 
ros 2284 — 6275 
ту оч. 63 12450 6220 101-20 


<247 61775 
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0:65 124-75 62:375 10421 
EL «252  . 63:00 
II Qrt. 58 125-50 62:750 92:43 
<250 62:50 
II Qrt. 66 125-75 62:875 104:97 
«253 63:25 
IV Qrt. 61 e 127-75 63:875 95:50 
258 64° 
1973 IQrt. 68 Б on 128-25 64:125 106-04 
5 M 
Пон. 63 nn 129-00 64-500 97-67 
+261 65-25 


Пон. 63 
W ee 67 
—————————————__——_________ 

СОМРОТАТІОМ ОЕ SEASONAL INDICES 


Trend Eliminated Values 
Year I Qrt. И Qrt. Ш Qrt. IV Qrt. 
а Е a  ———— EK 

1971 = — 96:63 101:20 

1972 104-21 92:43 104-97 95-50 

1973 106-04 97:67 ~- — 

Total 210:25 190-10 201-60 196-70 
Average (A.M.) 

(S.L) 105:13 95:05 100:80 98:35 
Adjusted Seasonal 

Indices 105:31 95:21 100-97 98:52 


Sum of seasonal indices is : 
105.13 4-95.05-- 100.80 4-98.35 —399.33 
which is less than 400. These indicese are, therefore, adjusted to a 
total of 400 by multiplying each of them by a constant factor : 
400 — 
k= 399.33 = 1.0017 


11.6.4. Method of Link Relatives, Уу 
the concept of ‘Link Relatives’ in the last 
chain index numbers. Link Telative (L.R.) 
phenomenon in any season (month, for т 


Link Relative for any month = Cre nt month’s value 100 
evious month's value 


(11.40) 


RS > ` 
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For example, 
Value (figure) for March 100 
Value (figure) for February s 
-- (11.402) 


The constuction of indices of seasonal variation by the Link 
Relatives method, also known as Pearson's method, involves the 
following steps. 

(i) Convert the original data into link relatives by formula 
or ie., express each value as a percentage of the preceding 
value. 


(ii) As in the case of ‘Ratio to Trend’ or ‘Ratio to M.A.’ 
method, average these link relatives foreach month, the average 
being taken over the given number of years. Arithmetic mean or 
median may be used for averaging. Median is preferred to A.M. 
as the latter gives undue importance to extreme observations which 


are not basically due to seasonal swings. 


(iii) Convert these link relatives (L.R.) into chain relatives 
(C.R.) on the basis of Ist season by the formula : 


L.R. of that month X C.R. of preceding hmont 
1 


L.R. for March= 


C.R. for any month= 
.ө(11.41) 
the chain relative for January being taken as 100. For example : 


.R. of Feb.x C.R. of Januar 
C.R. for February c Beg E y 


—L.R. of Feburary (°" C.R. of Jan.— 100) 
_ LR. of March x C.R. of Feb. 


C.R. for March 100 
| * LR. of Dec. X CR. of Nov. 
C.R. for Decembet= 0 — — 300 ст 


first month viz., January on the basis 


iv) Obtain the C.R. for 
a re which is given by : 


of the December chain relative, 
New C.R. for January 
is LR. of January С.А. of Dec. (1142) 


H not be 100 due to the effect of long-term 


this wi о 
ее у the chain indices аге to be adjusted or 


secular trend and accordingl 
corrected for the effect of trend. 


y) This adjustment is done by subtracting a ‘correction factor” 
from each of the chain relatives. 
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Let us write: 


d= т [ New C.R. for January— 100 | (11.43) 


If we assume a straight line trend, then the correction factor 
for February, March,..., December is d, 2d,..-, 114 respectively. 


(vi) The indices of seasonal variation are obtained on adjust- 
ing these corrected chain relatives to a total of 1200, by expressing 
each of them as a percentage of their arithmetic raean. This amounts 
to multiplying each of them by a constant factor. 


теа of the corrected monthly chain relatives 


Remark. For quarterly data, we write 


d=} [New C.R. for Ist Quarter— 100] 


and the corrected C.R.’s for 2nd, 3rd and 4th quarter are obtained 
on subtracting d, 2d and 3d from the C.R.’s obtained in step (iii). 


Finally, adjust these corrected C.R.’s to a total of 400, by 
multiplying each of them by a constant factor, 
ма 400 
~ Sum of the corrected quarterly С.К. 


to get the indices of seasonal variation. 

Merits and Demerits: (i) The averaged link relatives include 
both the cyclic and trend components. Though trend is subsequently 
eliminated by applying correction, the indices obtained will be truly 
representative only if the data really exhibits a straight line trend. 
However, this is not so in most of the economic and business time 

Series. 


(ii) Though not so easy to understand as the moving average 
method, the actual calculations involved in this method are much 
less extensive than the ‘Ratio to M.A.’ or ‘Ratio to Trend’ method. 


(iii) There is loss of only one link relative i.e., for the first 
season while in case of moving average method we lose some of 
the values (trend and seasonal) in the beginning and at the end 
Thus, ‘Link Relatives’ method utilises the data more completely. Е 


Example 11-20. Compute the seasonal indices by the Link 
Relatives’ method for the following data: кн 


Wheat Prices (їп Rupees per Quintal) 


Quarter Year > 1970 1971 1972 1973 
Ist (Jan.-March) 75 86 90 100 
2nd (April-June) — 60 65 72 78 
3rd (July-Sept.) 54 63 66 72 


4th (Oct.-Dec.) Я 59 80 82 93 
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COMPUTATION OF SEASONAL INDICES BY 
LINK RELATIVES METHOD 


LINK RELATIVES 
II Qrt. Ш Qrt. IV Qrt. 


Total of 
L.R.’s 375:91 313:58 


Average 
L.R. 
(А.М.) 125303 78:395 


CUR 100:000 78:395x100 78:395x92725 123:55x 72:69 
elatives 2553005; 1 507007957 
=78`395 =72'690 =89°810 Total 


Adjusted — 10у 78:395—3:135 72:690—6°270 89:810—9:405 
С.К. =75'26 ==66:42 =80'41 


Seasonal 


Indices 124-20 93°47 82:49 


The New (Second) C.R. for 1st quarter is : 


L.R. of Ist Qrt. X C.R. of last (4th) Qrt. 
100 


125.303 x 89.81 
= le 112! 
100 112.54. 


We have: 
d=} [New CR. of 1st Qrt.—100] 


=} (112.54—100)—3.135 


Adjusted C.R.'s for 2nd, 3rd and 4th quarters are obtained on sub- 
tracting d, 2d and 3d from the corresponding C.R.'s. 


Sum of adjusted C.R's=100+75.26+ 66-42+ 80.41 322.09, 
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Indices of seasonal variations are obtained on adjusting these 
adjusted C.R.’s to a total of 400 by multiplying each one of them 
with a constant factor, 


азе OOS es AON 
k= som of adjusted C.R.’s 322.09 202, 


and are given in the last row of the table on page 659 


11.6.5. Deseasonalisation of Data. As already pointed out 
the objective of studying seasonal variations is (i) to measure them 
and (ii) to eliminate them from the given series. Elimination of the 
seasonal effects from the given values is termed as deseasonalisa- 
tion of the data. It helps us to adjust the given time series for 
seasonal variations, thus leaving us with trend component, cyclical 
and irregular movements. Assuming multiplicative model of the 
time series, the de-seasonalised (seasonality eliminated) values are 
obtained on dividing the given values by the corresponding indices 
of seasonal variations. 


Deseasonalised Data == TOS тст (11-44) 


Deseasonalisation is specially needed for the study of cyclic 
component. It also helps businessmen and management executives 
for planning future production programmes, for forecasting. and 
for managerial control. It also helps in proper interpretation of the 
data. For example, if the values are not adjusted for seasonality 
then seasonal upswings (or downswings) may be misinterpreted as 
periods of boom and prosperity (or depression) in business. 


Remark. In case of absolute seasonal variations (additive 
model of time series), the deseasonalised values are obtained on 
subtracting the seasonal variations from the given values. Thus, 


Deseasonalised Data= y—S 
=(T+S+C+D-S 
=T+C+I 


Example 11.21. Deseasonalise the following data with the help 
of the seasonal data given against : 


Month Cash Balance (?000) Rs. Seasonal Index 
January 360 120 
February 400 80 
March 550 110 
April 360 90 
May 350 70 
June 550 100 


ЇС.А. (Intermediate), May 1983] 


Prem emt ee 


—Ó —— 
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Solution. Deseasonalised values are obtained on dividing 
the given time series values (Y) by the seasonal effect—assuming that 
the given series data follows multiplicative model of decomposition. 
We have : 


Seasonal Index S.I. 
Seasonal elect rA T Sod 


0 
Hence, using multiplicative model 
Ү=Тх$хСх1; 
lised yaluees сс со = 
Deseasonalised va пе опаа SI. 


COMPUTATION ОЕ DESEASONALISED VALUES 
nee anne 


Cash Balance Seasonal Deseasonalised Value 
Men (000 Rs.) Index eo. x100 
(Y) (S.L) T 
EE N E 
360 = 
January 360 120 90 x 100=30 
400 "m 
February 400 80 оо 500 
550 и 
March 550 110 110 ^ 100=500 
360 
April 360 90 "ор * 100=400 
May 350 70 E x 100—500 
550 
22> x 100—550 
June 550 100 20008 


Remark. If we assume the additive model of decomposition, 
then the deseasonalised values are given by (Y- S.I.). 
Example 11.22. The seasonal indices of the sales of garments 
of a particular type in a certain shop are given below : 
Seasonal index 


Quarter 
Jan.-March 97 
Apr.-June 85 
July-Sept. 83 
Oct-Dec. 135 


total sales in the first quarter of a year be worth 

Rs. É К ind sales are expected to rise by 4% in each quarter, 
determine how much worth of do cepa mis a e» kept ^ es 
5 to meet the demand for each of three quarters o 

bro eg rae (LC.W.A. (Final), June 1984) 
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bei Solution. Since the sales are expected to rise by 4% in each 
quarter, we have : 


Expected sales in any quarter=104% of the value of previous 
quarter. 


Taking into account the seasonal index (S.L) for each шап 


"* 


we have : 
Stock in any quarter (Expected Sales х$.1.) of that oam 


Using the formulae in (*) and (**) we can find the stock which 
the shop-owner should keep in the shop to meet the demand for 
each of the three other quarters of the year as explained below : 


—— 
Expected Sales Stock worth 
(in Rs.) (Rs.) 
=(3) x (4) 
(5) 


Seasonal 
index 


(2) 


Quarter 


(1) 


(4) 


Jan.—March 15,000 14,550 
Apr.—June se x 15,000=15,600 13,260 
July—Sept. е х 15,600= 16,224 13,466 
Oct.—Dec. he x 16,224—16,873 22,779 


of sales for the month of September? — (I.C.W.A. (Final) Dec. 1977] 


Solution. The owner of the company was justifiably not satis- 
fied with the rise of sales of Rs. (69,000—60.000)=Rs. 9.000 from 
August to September because on the basis of the seasonal index of 
September, the estimated sales for September should have been : 


Rs. о. X140 —Rs. 80,000 


Thus the actual sales of Rs. 69,000 for September is much 
below the expected sales and hence the dissatisfaction of the owner 
is justified. 


Aliter ; Actual sales for August are Rs. 60,000 and seasonal 
_ index for August is 105. Hence the seasonal effect for August is 
- 405 and accordingly the expected monthly sales are : 


à 60,000 
Rs. 63,000- L05—Rs. 1.05 


x 
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Seasonal effect for September is 140--100— 1.40 and, there- 
fore, the estimated sales for September are : e - 


60, 

Rs. 0 x1-40=Rs. 80,000 

11.7. Measurement of Cyclical Variations. Ап арргохі- 
mate or crude method of measuring cyclical variations is the 'Resi- 
dual Method’ which consists in first estimating trend (Т) and seaso- 
nal (S) components and then eliminating their effect from the given 
time series. Assuming multiplicative model of the time series, 
these components (Тапа S) are eliminated on dividing the given 
time series values by TX S viz., 


E: TSCI 
TXS =т= С ...(11.46) 


thus leaving us with cyclical and irregular movements. 

If we ignore the random or irregular variations or assume that 
their effect ıs not very significant, then the values obtained in 
(11°56) may be taken to reflect cyclical variations. 

To arrive at better estimates of cyclical fluctuations, the irre- 
gular component (I) should be eliminated from the CI values 
obtained in (11-56). But irregular movements, by their nature, can- 
not be determined as they are the residuals after adjusting the given 
data for trend, seasonal and cyclical variations. An attempt is then 
made to iron out or smoothen the irregular component by taking a 
moving average of these CI values. 

Steps in the computation of cyclical variations by the 'residual 
method’ may be summarised as follows : 

(i) Compute trend values (T) and the seasonal indices (S) 
preferably by the moving average method. S should be in fraction 
form and not in percentage form. 


(й) Divide given values by TX S. This step may be divided 
into two steps viz. З 

(a) Divide Y by T to get SCI. 

(b) Divide SCI by S to get CI. 


(iii) Take M.A. of the CI values obtained in Step (ii) above. 
For monthly data, often 3-month or 5-month moving average 


may be used. 

; Remarks 1. The 'residual method" will give effective results 
only if the trend component and seasonal fluctuations are correctly 
measured. This is, by far, the most commonly used method of - 


measuring cyclical variations. 
2. The problem of taking M.A. of the CI values involves 
two questions : 
(i) Period of M.A. 
(ii) Weighting system to be used. 
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For a detailed study of these, the reader is referred to Applied 
General Statistics by Croxton and Cowden. 


3. The other methods for studying the cyclical variations 
; (i) Reference Cycle Analysis Method. 

(її) Direct Percentage Variation Method. А Т 

Gii) Fitting of Sine Functions Method or Harmonic Analysis. 

For a detailed study of these methods the reader, is referred 
to Applied General Statistics by Croxton and Cowden. ; 

Example 11.24. Obtain the estimates of the cyclical variations 
for the data of Example 11.20. 


COMPUTATION OF INDICES OF CYCLICAL 


VARIATIONS 
ob et er hate у ш'. __ ___ 


5 Š Б 6 - E 
ki E | ES SS as 
с ао в 3c x S 35 
Ж S - [4 TE aa 7 КЕ: 
5 8 E moteur Та] E 
Дд > xu 
ES уу c6 xg $ 
0 ©) в) (4 (5) (6) (7) (8) 


1972 2 75 122:36 6129 — 
3 60 92:42 64:92 = 
4 54 84:69 63:76 63:375 100:61 
1 59 100:51 58:70 65:375 89:79 98:37 
39/32 86 122:36 70.28 67-125 104-70 27°91 
3 65 92:42 70:33 70:875 99-23 101-49 
4 63 84:69 74:39 24000 100-53 101-78 
1 80 100-51 79:59 75-375 105-59 100°70 
1974 2 90 12236 7355 76-625 95:99 100-65 
3 72 92°42 77:91 77:625 100:37 98:13 
3 66 84:69 77:93 79:500 98:03 100772 
4 85 100-51 8457 81500 10377 100:09 
1975 1 100 122:36 81:73 83-000 98°47 100:61 
2 78 92°42 8440 84-750 99°59 - 
3 72 84-69 85-02 = 
4 93 100-51 92:53 — 


Last coloumn (8) gives indices of cyclical variations. 


11.8. Measurement of Irregular Variations. By the nature of 
movements, no formula, however approximate, can be suggested 
to obtain an estimate of the irregular component in a time series. In 
practice, the three components ofa time series viZ., Trent (T), 
Seasonal (S) and Cyclical (C) are obtained and the irregular com- 
ponent is obtained as a residual which is unaccounted for by these 
components after eliminating them from the given series. Using the 
multiplicative model of time series, the random or irregular com. 
ponent is given by: ү TSC! | 

жерй) 


ТСС. TSC 


mos Sand Care in fractional form and not in percentage 
form. 
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However, in practice, the cycle behaves їр an erratic manner 
because successive cycles vary widely in period, amplitude and 
pattern and accordingly it is very difficult to measure the cyclical 
variations accurately. Moreover, they are so much inter-mixed with 
irregular variations that, quite often, it becomes practically impossi- 
ble to separate them. Accordingly, in analysis of time series, trend 
and seasonal components are measured separately and after elimi- 
nating their effect the cyclical and irregular fluctuations (CXT) are 
left together. 


Remark. Although the random or irregular component can- 
not be estimated accurately, we can obtain an estimate of the vari- 
ance of the random component by the “‘Variate Difference" method. 
The discussion is, however, beyond the scope of the book. 


EXERCISE 11.3 
1. What do you understand by ‘seasonal variations’ in time series 
data ? Explain with few examples the utility of such a study. 
[C.A. (Intermediate), May 1979] 


2.(a) Explain the meaning of Time Series. What are its main compo- 
nents ? How would you study seasonal variations in a Time Series ? 
(Guru Nanak Dev Uni. B. Com., II April 1982) 


(b) What are different components of an economic time Series ? Name 
the methods of determining seasonal index. | Defht Unt. B. Com., (Hons.) 1981] 


(c) What do you understand by seasonal indices? What methods are 
used to determine them ? [C.A. (Intermediate), May 1981] 
3. Explain the different components of an economic time series. How 
would you statistically eliminate the influence of seasonal and cyclical factors 
on the long period movement of апу series? грипјар Uni. B. Com., Sept. 1978] 


4. Explain what is meant by seasonal ductuations of a time series. 
Discuss the different methods for determining seasonal fluctuations of a given 
time series, Discuss the relative merits and demerits of each of these methods, 
Also state the conditions of applicability for each of the methods. 


5. What do you mean by seasonal fluctuations in time series. Give 
examples. 

Explain the method of ‘Simple Averages’ for obtaining indices of scaso- 
nal variations. Discuss its relative merits and demerits. 


6. Compute the seasonal averages, and seasonal indices for the follow- 
jng time-series. 


Month 1974 1975 1976 Month 1974 1975 1976 
Jan. 15 23 25 July 20 22 30 
Feb. 16 22 25 Aug. 28 28 34 
March 18 28 35 Sept. 29 32 38 
April 18 27 36 Oct. 33 37 47 
May 23 31 36 Nov. 33 34 4i 
June 23 28 30 Dec. 38 44 53 


[Bombay Uni. B. Com. 1976) 
[Hint. Use Method of Simple Averages] 
Ans. 70, 70, 90, 90, 100, 90, 80, 100, 110, 130, 120, 150 
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— 7. Calculate the Seasonal index from the following data using the 
average method : 


Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter 
1974 72 68 80 70 
1975 76 70 82 74 
1976 14 66 84 80 
1977 76 74 84 78 
1978 78 14 86 82 


[С.А. (Intermediate), May 1979] 
Ans, 98.43, 92.15, 108.90, 100.52 


8. Explain *ratio to trend' method of measuring seasonal variations. 
and discuss its relative merits and demerits. 


2 Find seasonal variations by the ratio-to-trend method from the data 
given below ; 


Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter 
1973 30 40 36 34 
1974 34 52 50 44 
1975 40 58 54 48 
1976 54 76 68 62 
1977 80 $9192; 86 82 


[Delhi U. B. Com. (Hons.) C.C. 1981] 
Ans, Straight line trend is given by : 
y=56+12x, 
Origin: 1975 (Ist July): x unitse1 year; y units: Average quarterly 
Seasonal Indices : 92.0, 117.4, 102.1, 88.5, 
9. (a) , Describe the ‘ratio to moving average’ and the ‘ratio to trend’ 
methods of estimating seasonal indices, Compare the two methods. 
[Delhi Uni. В, Com, (Hons.) II, 1984 е 
С.А. (Intermediate), November 1983] 


(b) Explain why ‘ratio to moving average’ method is considered to be 
the best measure of seasonal fluctuations, 
[Delhi Uni. B. Com., (Hons.) 1980] 


(с) Describe, step by step, the moving average method of. determin- 
ing seasonal index. [Delhi Uni. B. Com. (Hons.) 19781 


10. Giventhe following information, calculate the seasonal index 
number for each of the four quarters : 


Ratios of Observed Values to Moving Averages (%) 


values. 


Quarter 


1980 106 124 104 90 
1981 84 114 107 88 
1982 90 112 101 85 
1983 76 94 91 76 
1984 80 104 95 83 
1985 104 112 102 84 


[Delhi Uni. B.A. (Econ. Hons. D 19861 
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11. The following are the figures of quarterly production, for which 
some four quarterly centered moving averages have been calculated ; 


Year Quarter ~ Production Moving average 
1972 1 216 = 

2 281 — 

3 209 227 

4 200 226-13 
1973 1 220 229.88 

2 270 237.50 

3 250 24315 

4 220 252:50 
1974 1 250 

2 310 

3 280 

4 246 


Calculate the remaining values of moving averages. Treating the moving 
averages as trend values, compute the seasonal indices, 
(Bombay Uni. B. Com., April, 1976) 
Ans. M.A. values for I and II Quarter of 1974 are : 261-25, 268.25. 
Assuming multiplicative model of time series, Seasonal Indices are : 
96:65, 115°77, 98°29, 88:67. 


12. Given the following quarterly sales figures in thousands of rupees for 
the years 1966-1969, find the specific seasonals by the method of moving averages. 


I п ш IV 

1966 290 280 285 310 

1967 320 305 310 330 

1968 340 321 320 340 

1969 370 360 362 380 
[Delhi Uni. B.A. (Econ. Hons. I), 1984] 


Ans. 104-20, 97:90, 96°50, 101:40 
13. (a) Enumerate the various steps you would take in determining 
indices by Link Relative Method. 
[C.4. (Intermediate), (N.S.) November 1982 


Э (b) Explain *ratio-to-link relatives" method of measuring seasonal varia-] 
tions. 
[Delhi Uni. B. Com., (Hons.) 1986) 


(c) What do you mean by Link Relative? Explain the ‘link relative 
method’ of computing indices of seasonal variations. Discuss its merits and 


demerits. 
14. Obtain the seasonal indices by the link relative method, for the follow- 


ing data : 


seasonal 


AVERAGE QUARTERLY PRICE OF A COMMODITY 


Quarter 
1968 1969 1970 
1 31 31 34 
II 29 31 36 
ш 28 25 ` 26 
IV 32 35 33 


Ans. 108:02,99:75, 8123, 111-00. · 
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15. Apply method of link relatives to the following data and calculate 
seasonal indices. 


QUARTERLY FIGURES 
uarter 1971 1972 1973 1974 1975 
g I 6:0 54 68 T2 66 
п 6:5 T9 65 58 T3 
ш T8 84 9:3 T5 8:0 
IV 87 T3 6:4 8°5 7:3 


Ans. 88°18, 94°01, 113:21, 104-60. 


16. (a) What do you understand by deseasonalisation of data? Explain 
its uses. 


(b) . The seasonal indices of sales of a firm are as under : 


January 106 July 93 
February 105 August 89 
March 101 September 92 
April 104 October 102 
May 98 November 106 
June 96 December 108 


If the firm is expecting total sales of Rs. 42,00,000 during 1986, estimate 
the sales for the individual months of 1986. 
[Delhi Uni. B. Com., (Hons. II) 1986] 


Ans. Estimated sales (in '000 Rs.) for January to December are : 


4452, 4410, 4242, 4368, 4116, 4032, 3906,3738, 3864, 4284, 
4452, 4536 


17. Тһе seasonal indices of the sale of readymade garments of a parti- 
cular type in a certain store are given below : 


Quarter Seasonal Index 
Jan.—March 98 
April—June 89 
July—Sep. 82 
Oct, —Dec. 130 


If the total sales in the first quarter of the year be worth Rs. 10,000, 
determine how much worth of garments of this type should be kept in stock by 
the store to meet the demand in each of the remaining quarters. 


[Delhi Uni. M. Com., 1982] 
Ans. Quarter 2 п ш IV 


Estimated Sales (Rs.) : 9081:63 836735 13265:80 


18. The sales of a company rose from Rs. 40,000 in March to Rs. 
48,000 in April 1957. The company's seasonal indices forthese two monha are 
105 and 140 respectively. The owner of the company expressed dissatisfaction 
with the April sales, but the Sales Manager said that he was quite pleased with 


the Rs. 8,000 increase. What argument should the owner of 
used to reply to the Sales Manager ? о ешрш ше 


The Sales Manager also predicted оп the basis of the April sales that the 
total 1957 sales were going to be Rs. 5,76,000. Criticise the Kalea Manager's 
estimate and explain how the estimate of Rs. 4,11,000 may be arrived at. 


Ans. Owner's estimate of Sales for April 1957-20000 х 140 


=Rs. 53333:33 
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Sales manager ignored the S.I. for April-1957. 
Owner's estimate of Annual Sales for 1957 


05 х 12= Ез. 411,000 (nearest thousand) 


19. What аге the different components of a ti i i 
you will measure short period fluctuations in a time ENS БЕ: 1 Бано doy 
[Punjab Uni. В.А. (Econ. Hons. П), April 1983] 


20. How is the analysis of time series "useful in busi i ? 
Describe briefly, the phases of a business cycle. тшен апа наи 
(Bombay Uni. B.Com., April 1982) 


21. Explain the term **cyclical component of a time series". Describe a 
method for obtaining this component from a given series of monthly data. 
Explain any procedure known to youfor detecting the presence of a cyclical 
component. 


22. Explain the nature of cyclical variations ina time series. How do 
seasonal variations differ from them ? Give an outline of the moving average 


method of measuring seasonal variations. 
[Delhi Uni. B.A. Econ. (Hons.) 1976) 


23. What do you understand by irregular fluctuations in a time series ? 
How can they be measured ? 


EXERCISE 114 


Short and Objective Type Questions 
1. Define time series. 
2. Explain the various components of a time series. 
3. Outline briefly the use of Time Series Analysis. 
4, Enumerate the various components of a time series. 
5. What do you mean by Secular Trend? Give examples. 
6. Explain the meaning of seasonal variations, with illustrations. 
7. (a) What are cyclic variations ? How are they caused ? 
(b) Give the four phases of Business Cycle. 
8. How do cyclical variations differ from seasonal variations ? 
9. What are irregular variations ? How are they caused ? 
10. What do you understand by “Additive Model’ in time series analysis ? 
State clearly the assumptions. 
11. What do you understand by ‘Multiplicative Model’ in time series 
analysis ? State clearly its assumptions. p 
12. Of the Additive and Multiplicative Models in time series analysis 
which is better and why ? А 
13. Enumerate the different methods of estimating 
(i) Trend, (ii) Seasonal variations, 
in a time series analysis. 
14, Suppose you have fitted a straight line trend 
y-85:6--24x 
Origin 1980 ; x unit=1 year, 
у: Annual production of sugar (in *000 quintals) 
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(i) What is the slope of the trend line ? 
(ii) What is the monthly increase in production ? 
(iii) Does the trend line exhibit an increasing trend or decreasing trend ? 
$ (iv) Shift the trend equation to 1975. 
(у) Convert the equation to monthly trend. 


КҮ 15. What do you understand by *Deseasonalisation of Data’? Explain 
by means of an illustration. 


16. Fill in the blanks : 


(i) seen т is the over all tendency of the time series data 1о............ 
ӨС оло OVER Gen еа siss period of time 
(ii) Short term variations are classified as : 
(d) ieu, coste eripe ms] TEE AMORE 


(iii) The period of the moving average should be equal to 


(iv) If the trend is absent in the data, then the seasonal indices are 
computed Dy -sess ...... 


(у) Cyclical variations are caused Ъу............ 
(vi) The time series data exhibits ............ trend if the rate of growth is 
constant. 
(vii) The least square linear trend equation yea +bx exhibits а............ 
trend if 50 and a ............ trend if b<0. 
(viii) The four phases ot a business cycle (in order) are ............ 


(ix) Using Multiplicative Model of' Time Series, the time series values 
(у) are given by : 

y= 

where ............ 


~ 


(x) The annual trend equation : 
у=а+х, 
[x unit=1 year; y: annual sales] 
reduced to monthly trend equation is 
Y= serten, 
(xi) For the annual data, ...... component is absent. 
E (xii) Seasonal variations are the short-term variations with period. 


т, 


nmm 


(xiii) The most widely used method of measuring seasonal variations is 


(xiv) For the additive model intime series analysis, for annual data the 
difference Y — T represents...... 


(xv) The most important factors causing seasonal variations are. 
Ans. - (i) increase, decrease, long. 
(ii) (а) Seasonal, (b) Cyclical. 
\ (tii) Period of oscillatory movements. 
(іу) Method of Simple Averages. 
(у) Trade or Business Cycles. 
(vi) Linear, (vii) Rising, declining. 


(viii) Economic boom (prosperity), recession, depression and recovery 
^ (improvement). 


f 


© 


Time Series Analysis 671 


ix) Y= = eee? 
(ix) Y=TxSxCxI, (х) у i tda* 


(xi) Seasonal, (xii) Less than one year. 
(xiii) Ratio to M.A. Method, (xiv) Cyclical and Irregular components 
(xv) Weather (seasons) and social customs. 
17. With which components of a time series would you mainly associate | 
each of the following ? Why ? Я 
(а) (i) A fire in a factory delaying production for three weeks. oH 
(ii) An era of prosperity. 
(iii) An after-Onam sales spree in a departmental store. 
(iv) A need for increased wheat production due to constant increase _ 
in population, : At 
(vy) Recession. 
Ans. (i) Irregular (ii) Cyclical (iii) Seasonal (iv) Long-term trend, 
(у) Cyclical. 
[(L.C.W.A. (Final), June 1975] 


(b) (i) A strike in a factory delaying production for 10 days. 
(ii) A decline in ice-cream sales during November to March. 
(iii) The increase in day temperature from winter to summer. · 
(iv) Diwali sales in a departmental store. 
(у) Fallin death rate due to advances in science. 


(vi) Rainfall in Deshi in July 1980. 
(vii) Increase in money in circulation for the last 10 years. 


(viii) Rainfall in Delhi that occurred for a week in December 1979. 


(ix) Inflation. 

(x) An increase in employment during harvest time. 

Seasonal (iii) Cyclical (iv) Seasonal (v) Long- 
term trend (vi) Seasonal (vii) Trend (viii) Irregular (ix) Cyclical (x) Seasonal. 


18. Write down the four characteristic movements of a time series. With я 
which characteristic movement of a time series would you associate : (i) a E 


recession, (ii) increasing demand of smaller automobiles, (iii) decline in death Jm 


due to advances in medical science ? 
[I.C.W.A. (Final), Dec. 1979] 


Ans. (i) Irregular (ii) 


Ans. (i) Cyclical, (її) Seasonal, (iii) Secular Trend. 


19. Cyclical fluctuations are caused by : 


(i) Strikes and lockouts (iii) Wars 2 2 
(ii) Floods (iv) None of these. 
[C.A. (intermediate), May 1983] 


Ans. (iv). 


12 
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121. Introduction. If an experiment is performed repeatedly 
under essentially homogeneous and similar conditions, the 


result or what is commonly termed as outcome may be classified as 
follows : 


(a) It is unique or certain. 


(b) It is not definite but may be one of the various possibili- 
ties depending on the experiment. 


The phenomenon under category (a) where the result can be 
predicted with certainty is known as deterministic or _ predictable 
phenomenon. Ina deterministic phenomenon, the conditions under 


which an experiment is performed, uniquely determine the outcome 
of the experiment. For instance : 


(i) In case of a perfect gas we have Boyle's law which states, 
Pressure X Volume— Constant 


pe; PV — Constant 
- Væ $ y 
provided the temperature remains constant. 


(ii) The distance (s) covered bya particle after time (f) is 
given by 


s=ut+ lan 
where u is the initial velocity and a is the acceleration. 


(iii) If dilute sulphuric acid is added to zinc we get hydrogen. 


Thus most of the phenomena in Physical and chemical scien- 
ces are of a deterministic nature. However, there exist a number 
of phenomena as generated by category (b) where the results cannot 
be predicted with certainty and are known as unpredictable or pro- 
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babilistic phenomena. Such peenomena are frequently observed in 
economics, business and social sciences or even in our day-to-day 
life. For example : 


__ (i) In case of a new born baby, the sex cannot be predicted 
with certainty. 


К (ii) A sales (or production) manager cannot say with certainty 
if he will achieve the sales (or production) target in the season. 


; (iii) lfan electric bulb or tube has lasted for 3 months, noth- 
ing can be said about its future life. 


(iv) In toss of a uniform coin we are not sure if we shall get 
head or tail. 


(v) A producer cannot ascertain the future demand of his 
product with certainty. 


Even in our day-to-day life we say or hear phrases like “It may 
rain today" ; “Probably I will get a first class in the examination"; 
“India might draw or win the cricket series against Australia"; and 
so on. In all the above cases there is involved an element of uncer- 
tainty or chance. A numerical measure of uncertainty is provided 
by a very important branch of Statistics called the ‘‘Theory of Pro- 
bability^. Inthe words of Prof. Ya-Lin-Chou : "Statistics is the 
science of decision making with calculated risks in the face of uncer- 


12.24. Short History. The theory of probability has its origin 
in the games of chance related to gambling, for instance throwing 
of dice or coin, drawing cards from a pack of cards and so on. 
Jerome Cardan (1501-1576) an Italian mathematician was the first 
man to write a book on the subject entitled "Book on Games of 
Chance", (Liber de Ludo Aleae) which was published after his death 
in 1663. ltisa valuable treatise on the hazards of the game of 
chance and contains a number of rules by which the risks of gamb- 
ling could be minimised and one could protect oneself against 
cheating. However, a systematic and scientific foundation of the 
mathematical theory of probability was laid in mid-scventeenth 
century by the French mathematicians Blaise Pascal (1623 62) and 
Pierre de Fermat (1601-65) while solving a problem for sharing the 
stake in an incomplete gambling match posed by a notable French 
gambler and nobleman Chevalier-de-Mere. The lengthy corres- 
pondence between these two mathematicians, who ultimately solved 
the problem, resulted in the scientific development of the subject of 
probability. The next stalwart in this field was the Swiss mathe- 
matician James Bernoulli (1654-1705) who made extensive study of 
the subject for twenty years. His ‘Treatise on Probability’ (Arts 
Conjectandi), which was published posthumously by his nephew in 
1713, is a major contribution to the theory of Probability and 
Combinatorics. A. De-Moivre (1667-1754) also contributed a lot 
to this subject and published his work in his famous book ‘The 
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Doctrines of Chances? in 1718. Thomas Bayes (1702-61) introduced 
the concept of /nverse Probability. The French mathematician 
Pierre.Simon de Laplace (1749-1827) aner an extensive research 
Over a number of years published his monumental work Theorie 
Analytique des Probabilities, (Theory of Analytical Probability), in 
1812. This resulted in what is called the classical theory of probabi- 
lity. R.A. Fisher, Von-Mises introduced the empirical approach to 
the theory of probability through the notion of sample space. 


Russian mathematicians have made very great contributions to 
the modern theory of probability. Main contributors, to mention 
only a few, are Chebychev (1821-94) who founded the Russian 
School of Statisticians ; A. Markov (1856-1922), Khinchine (Law of 
Large Numbers), Liapounoff (Central Limit Theorem), Gnedenko 
and A.N. Kolmogorov. Kolmogorov axiomised the theory of proba- 
bility and his small book ‘Foundations of Probability’ published in 
1933 introduced Probability as а set function and is considered as a 


Today, the subject has been developed toa great extent and 
there is not even a single discipline in Social, physical or natural 
Sciences where probability theory is not used. It is extensively used 
in the quantitative analysis of business and economic problems. It 
is an essential tool in Statistical inference and forms the basis of the 
“Decision Theory’, viz., decision making in the face of uncertainty 
with calculated risks, 


12:3. Terminology. As already discussed above there are 
three approaches to probability : 


(i) Classical approach, 
(ii) Empirical approach, 
(iii) Axiomatic approach, 


neous conditions, the result is not unique but may be any one of the 
various possible outcomes, 


Trial and Event. Performing of a random experiment is 
called a tria] and outcome or combination of outcomes are termed 
as events, For example : 


— 


*А die is a homogeneous cube with six faces marked with numbers from 
1 to 6. Plural of the word die is dice. 


# 
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(i) Ifa coin is tossed repeatedly, the result is not unique. We 
may get any of the two faces, head or tail. Thus tossing of a coin 
is a random experiment or trial and getting of a bead or tail is an 
event. 


(ii) Similarly, throwing of a die is a trial and getting any one 
of the faces 1, 2,...,6 is an event, or getting of an odd number or an 
even number is an event ; or getting a number greater than 4 or 
less than 3 are events. 


(iii) Drawing of two balls from an urn containing ‘a’ red balls 
and ‘b’ white balls is a trial and Betting of both red balls, or both 
white balls, or one red and one white ball are events. 


Event is called simple if it corresponds to a single possible out- 
come of the experiment or trial otherwise it is known as a com- 
pound or composite event. Thus, in tossing of a single die the event 
of getting ‘5’ is a simple event but the event ‘getting an even num- 
ber’, is a composite event. 


Exhaustive Cases. The total number of possible outcomes of 
а random experiment is called the exhaustive cases for the experi- 
ment. Thus, in toss of a single coin, we can get head (A) or tail 
(Т). Hence exhaustive number of cases is 2, viz, (Н, T). If two 
coins are tossed, the various possibilities are HH, HT, TH, TT 
where HT means head on the first coin and tail on second coin and 
TH means tail on the first coin and head onthe second coin and 
50 on. Thus in case of toss of two coins, exhaustive number of 


cases is 4, i.e., 22, Similarly, in a toss of three coins the possible 
number of outcomes is ; 


(Н, T)x Gt, T) x (H, T) 
=(HH, HT, TH, TT) x (H, T) 
—HHH, HTH, THH, TTH, HHT, HTT, THT, TIT 


Therefore, in case of toss of 3 coins the exhaustive number of 


cases is 8=2° In general, in a throw of n coins, the exhaustive 
number of cases is 2^ 


In a throw of a die, exhaustive number of cases is 6, since we 
can get any one of the six faces marked 1, 2, 3,4, 5 or 6. If two 
dice are thrown the possible outcomes are : 


ота) 05:4) 99 65) (1, 6) 
(2,1) (22) (53) (254 (2, 5) (2, 6) 
(3,1) (532) (35,3) (3.4) (3,5) (3, 6) 
(5D (4,2) (43) (44) (45) (4, 6) 


(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) 

(6,1) (62 (63 (64 (6,5) (6, 6) 
ie., 36 ordered pairs where pair (i, Ј) means number і оп the first 
die and j on the second die, і and j both taking the values from 1 
to 6 Hence, in the case of a throw of two dice exhaustive nuniber 
of cases is 36—6* Thus for a throw of 3 dice exhaustive number of 
cases will be 216=63, and for n dice they will be 6". ! 
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If r cards are drawn from a packof cards, theexhaustive num- 


ber of cases is "cm ( 2 , since r cards can be drawn out of л 
r 


cards in Е ways. 


Favourable Cases or Events. The number of outcomes of 
a random experiment which entail (or result in) the happening of 
an event are termed as the cases favourable to the event. For 
example : 


(i) Ina toss of two coins, the number of cases favourable to 
the event ‘exactly one head’ is 2, viz., HT, TH and for getting ‘two 
heads’ in one viz., HH. 


(ii) In drawing a card from a pack of cards, the cases favour- 
able to ‘getting a diamond’ are 13 and to ‘getting an ace of spade 
is only 1, 


Matually Exclusive Events or Cases. Two or more events 
are said to be mutually exclusive if the happening of any one of 
them excludes the happening of all others in the same experiment. 
For example, in toss of a coin the events ‘head’ and ‘tail’ are mu- 
tually exclusive because if head comes, we can’t get tail and if tail 
Comes we can’t get head. Similarly in the throw ofa die, the 
six faces numbered 1, 2, 3, 4, 5 and 6 are mutually exclusive. Thus, 
events are said to be mutually exclusive if no two or more of them 
can happen simultaneously. 


Equally Likely Cases. The outcomes are said to Ыз equally 
likely or equally probable if none of them is expected to occur in 
preference to other. Thus, in tossing of a coin (die), all the 
outcomes, viz., H, T (the faces 1, 2, 3, 4, 5, 6) are equally likely if 
the coin (die) is unbiased, 


Independent Events. Events are said to be independent of 
each other if happening of any one of them is not affected by and 
does not affect the happening of any one of others. For example : 


(i) In tossing of a die repeatedly, the event of getting ‘5’ in 
d throw is independent of getting ‘5’ in second, third or subsequent 
throws. 


(i) In drawing cards from a pack of cards, the result of the 
second draw will depend upon the card drawn in the first draw. 
However, if the card drawn in the first draw is replaced before 
drawing the second card, then the result of second draw will be 
independent of the 1st draw. 


Similarly, drawing of balls from an urn gives independent 
events if the draws are made with replacement. If the ball drawn 
in the earlier draw is not replaced, the resulting draws will not be 
independent, 
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12.4 Mathematical Preliminaries 


12.4.1. Set Theory. A set is a well defined collection or 
aggregate of objects having given properties and specified according 
to a well defined rule. For example, letters in the English alphabet ; 
vowels (or consonants) in the English alphabet ; Prime Ministers of 
India : Colleges in Delhi, еїс.. are all sets. The objects comprising 
the set are known as its elements. Sets are usually represented by 
the capital letters of the English alphabet, viz., A, B, C, etc. We shall 
use the following symbols : 


€ : Belong to 

& : Does not belong to 
C : Contained in 

2 : Contains 


If x is an element of the set A we write x € A and if x is not 
an element of set A we write x & A. A set is written by enclosing 
its elements within curly brackets. For example : 

A=Set of first 10 natural numbers 
={l, 2, 3, 4, 5, 6, 7, 8, 9, 10) 
={x:x EN; x $10} 

B=Set of odd positive integers 
EXPE USE 
={x : x=2n+1, nE I+} 

Null Set. A set having no element at all is called a null or 
an empty set. Itis denoted by the symbol ¢ (Phi). For example, 


if two dice are thrown and 4 is a set of points on the two dice so 
that their sum is greater than 12, then A is a null set. Also 


B={x : x*+1=0, x real}=¢, 
since the solution of the equation x?+1=0 is always imaginary. 


Sub-set. A set A is said to be a proper subset of В if every 
element of A is also an element of B and there is at least one ele- 
ment of B which is not an element of A and we write A C B. 


If latter restriction is removed, then A is said to be a subset of 
B and we write 4 CB. ч 


Equality of Two Sets. Two sets А and В are said to be equal, 
if every element of A is an element of B and if every element of B is 
an element of 4. Mathematically, 


A—Bifx€ А >» x € Bandx€ B5x€ А 
Remarks. 1. Every set is a subset of itself, i.e., A C A. 
2. The null set ¢ is a subset of every set, i.e., $ С A. 


678 Business Statistics 


Universal Set. In any problem, the overall limiting set, of 
which all the sets under consideration are subsets, is called an uni- 
versal set. We shall denote it by S. The universal set will vary 
from situation to situation. 


ALGEBRA OF SEIS 


The union of two sets А and В, denoted by 4 U B, is defined 
‘as a set of elements which belong to either А or Bor both. Sym- 
bolically, we write 


AU B—(r:x € Aorx € B) 
For example if — 4—(1,2, 3, 4}, B— (3, 4, 5, 6} 
then 
AUBE, 3, 4, 5, 6} 


The intersection of two sets A and B, denoted by A N B, is 
defined as a set whose elements belong to both A and B. Symboli- 
cally we write : 


А N B-(x:x € A and x € В} 
Thus, in the above case 
A N B={3, 4) 
Two sets А and Bare said to be disjoint or mutually exclusive 


if they do not have any common point, Mathematically, 4 and B 
are said to be disjoint if their intersection is a null set 


Le, ЖА N B—4. 
The complement of a set A, usvally denoted by Aor A’ or 


‚4° is the set of elements which do not belong to the set A but which 
belong to the universal set S. Symbolically, 


A or Ac fx Dx @ Aandz € S) 
Remark. Obviously 4 and 4° are disjoint i.e.. AN 404, 


The difference of two sets 4 and B, denoted by 4—B is the 

* Set of elements which belong to 4 but not to B. Symbolically, 
A-B={e:x€ Aand х Æ В} 

This can also be written as d 


A-B={x:x € (А n B} 
Thus 4—B is equivalent to 4 AB 


Laws of Set Theory. If 4, B and C are subset: 


Set S then the following laws hold = ot the universal 
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Commutative Laws : 
AU BUA (For Union) 
A N B=BQ A (For intersection) 
Associative Laws : 
AU (B U C)=(A U В) UC (For Union) 
AN(BNO=(AN BAC (For intersection) 
Distributive Laws : 
AN (B О С)=(А п B) U (А n C) 
4 U (Bn C)=(4U B) n (4 U С) 
Hence intersection is distributive w.r.t. union and union is. 
distributive w.r.t. intersection. 


Difference Laws : 


A—B—-A N B 
A—B—4A4—(A N В)=(4 U B)-B 
Complementary Laws : 
| AU 4-8; А N A= 
AU S=S ; (7 AC S); A П S=A 


AU $—A4;A N $=¢ 
De-Morgan’s Laws of Complementation : 

(4 U B)*=A* N Be Y 
ie., the complement of the union is equal to the intersection of the 
complements. 

(4 N By-4* U BY, 

i.e., the complement of intersection is equal to the union of comple- 
ments. 


.. The various operations on sets, viz, union, intersection, 
difference and complementation can be expressed diagrammatically 
through Venn diagrams given below : 


UNION OF TWO SETS INTERSECTION OF TWO SETS 


~ 


AUB=BUA=Shaded region ANB =BNA = Shaded region 
Fig. 121. ; Fig. 122, 
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DISJOINT SETS COMPLEMENT OF A SET 


9 


Y 
22 


=S-A = Shaded region 
Fig. 12:3. Fig. 12:4. 


WS 


WSK 


Y 


Pr 


The laws of complementation can be generalised to n sets. If 
АС S ; i=1, 2,...,n then 


n n 
(О 49'2n (4:9) 
1=1 і=1 
п п 
and (N.4:)°=U (4e) 
i=] i21 
Idempotency Law : 


4 U 4-4 and A NA=A 


1242. Permutation and Combination. The word permu- 
tation in simple language means 'arrangement' and the word combi- 
nation means ‘group’ or ‘selection’. Let us consider three letters А, 
BandC. The Permutations of these three letters taken two at a 
time will be АВ, BC, СА, BA, CB and AC, i.e., 6 in all whereas the 
combinations of three letters taken two ata time will be „АВ, BC 
and CA, i.e., 3 in all. It should be noted that in combinations, the 
order of the elements (letters in this case) is immaterial, i.e., AB and 
BA form the same combination but these are different arrange- 
ments. Similarly, in case of 4 letters A, B, C, D, the total number of 
combinations taking three at a time is : ABC, ABD, ACD, BCD, іе., 
4 in all. However, each of these combinations gives six different 
arrangements. For example, different arrangements of the combi- 
nation ABC are ABC, ACB, BAC, BCA, CAB, CBA, 


Hence, the total number of Permutations (arrangements) of 
4 letters taking 3 at a time is 4x 6—24. 


Permutation (Definition). А Permutation of n di 


ifferent objects 
taken r at a time, denoted by "Pr, isan ordered arrangement of only 
r objects of the n objects. 


We shall now state, without Proof, some important results on 
permutation in the forms of theorems. 


+ 


" ee 
€——————————— 


Theory of Probability — 681 
Theorem 12.1. The number of different permutations of n diffe- 
rent objects iaken r at a time without repetition is 
*"Pr=n(n—1)(n—2)...(n—r+1) (12.1) 
i.e., it is a continued product of r factors starting with п and differ- 
ing by unity. For example : 
3p,—3X2—6 
1p,—4X3X 2—24 
In particular, the total number of permutations of n distinct 
objects, taken all at a time is given һу; 
"Pa=n(n—1) n—2)...1 
[Take r=n in (12.1)] 
> Pn! -(12.2) 
Remarks 1. Factorial Notation. The product of first n natural 
numbers, viz., 1, 2, 3,..., n is called factorial п or n-factorial and is 
written as n! or| 2 . Thus, 
n\=|" =1х2х3х...х (n—1)xn 5.01213) 
Rewriting, we have 
n !=n(n—1) (л—2)...3.2.1 
= n !=n[(n— 1) (n—2)...3. 2. 1] 
> n!=n(n—1) ! ++(12.4) 
Repeated application of (12°4) gives : 
n!=n(n—1) (n—2)! 
=n(n—1) (n—2) (n—3) ! 
and soon. For example, we have : 


5 !=5x4x3x2x1=120 
=5x4! 
=5х4х3 1 
апа ѕо оп. 


By convention we take 0 !=1, i.e., 0-factorial is defined as 1. 
2. We have 
"Pr=n(n—1)(n—2)...(n—r+1) 
__n(n—1)(n—2)..-(n—r+1)(n—r) (n—r—1)...3.2.1 
X (n—r) (n—r—1)...3.2.1 


"P= ct (12.5) 


a form which is much more convenient to remember and use for 
compu tational purposes. 
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Theorem 12.2. The number of different permutations of n 
different (distinct) objects, taken r at a time with repetition is : 


“Р,=п' (12.6) 
In particular, 
nP,—n^ 


Theorem 12.3. The number of permutations of m different 
objects all at a time round a circle is (n— 1) ! 


Theorem 12.4 (Permutation of objects not all distinct). The 
number of permutations of n objects taken all at time, when л, ob- 
jects are alike of one kind, л, objects are alike of second kind,..., 
nx objects are alike of kth kind is given by 

n! 
ny! ng! nx! 


For example, total number of arrangements of the letters 
of the word ALLAHABAD taken all at a time is given by : 


9! 9х8х7х6х5 
З SA. 140 
because in this word, there are 9 letters out of which 4 are of one 


kind, i.e., A ; 2 are of 2nd kind, ie., L and rest are all different 
occurring once and 1 !—1. 


Theorem 12.5 (Fundamental Rule of Counting). Jf one 
operation can be performed in p different ways and another operation 
can be performed in q different ways, then the two operations when 
associated together can be performed in px q ways. 


The result can be generalised to more than two operations. 


For example, if there are five routes of journey from place A 
to place B, then the total number of ways of making a return jour- 
ney (i.e.. going from A to B and then coming back from B to A) 
are 5X 5—25, since one can go from A to B in 5 ways and come 
back from B to Ain 5 ways and anyone of the ways of going can 
be associated with any one of the ways of coming. 


Combination (Definition), 4 combination of n different objects 


(12-7) 


taken r at a time, denoted by "Cr or ( ) is a selection of only r 


objects out of the п objects, without any regard to the order of 
arrangement. 


. Theorem 12 6. The number of different combinations of п 
different objects taken r at a time, without repetition, is 


Crate Lan aepo d ER 
e-( r )- ri (n—r)!? ren (128) 
"р, 7 
T [Using (12.5)] -..(12.8 a) 


and with repetition is CAE or "tra, 


Ed 


ST li. 
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Remark, "Co, "C,,..., "Са are known as Binomial Coefficients, 
We have "Co=1="Cn. 


Theorem 127. "Cr="Cn_+ (12.9) 
Theorem 128. "C.4-"C, ,—"'1C, ++(12.10) 
Theorem 129. (Sum of Binomial Coefficients) 

"Cy T-"C, "С... "Са 2" (12.11) 


12.5. Mathematical or Classical or ‘a Priori’ Probability 


Definition. /f a random experiment results in N exhaustive, 
mutually exclusive and equally likely outcomes (cases) out of which m 
are favourable to the happening of an event A, then the probability of 
occurrence of A, usually denoted by P(A) is given Ьу: 


Favourable number of cases to A 


EA) Exhaustive number of cases 


m 
шй; .+(12.12) 
This definition was given by James Bernoulli who was the first 
man to obtain a quantitative measure of uncertainty. 


Remarks. 1, Obviously, the number of cases favourable to the 
complementary event 4 i.e., non-happening of event А аге (N—m) 
and hence by definition, the probability of non-occurrence of A is 
given by: 

pie Favourable No. of Cases to 4 


EVA Exhaustive number of cases 
_ N=m =й 
NAE MUN 
> P( A_)=1—P(A) 2.12.13) 
= P(A)+P(A)=1 (12-14) 


2. Since m and N are non-negative integers, P(4)20. 
Further, since the favourable number of cases to A are always less 
than the total number of cases N, i.e., mX;N, we have PLA) & 1. 
Hence probability of any event is a number lying between 0 and 1, 
be, 

O«P(A)«I, (12.15) 
for any event A. If P(A)=0, then (A) is called an impossible or null 
event. If P(A)=1, then A is called a certain event. 


3. The probability of happening of the event A, i.e., P(A) is 

also known as the probability of success and is usually written as p 

and the probability of the non-happening, i.e., P( 4 ) is known as 

the probability of failure, which is usually denoted by g. Thus, 
from (12.13) and (12.14) we get 

g-l-m e pga (12.16) 


Ma 
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4. According to the above definition, the Probability of get- 
ting a head in a toss of anunbiased coin is $, since the two exhaust- 
ive cases H and T (assuming the coin does not stand on its edge), are 
mutually exclusive and equally likely and one is favourable to get- 
ting ahead. Similarly, in drawing a card from a well shuffled pack 
of cards the probability of getting an ace is 4/52 — 1/13. Thus, the 
classical definition of probability does not require the actual exper- 
imentation, i.e., no experimental data are needed for its computation, 
nor it is based on previous experience. It enables us to obtain pro- 
bability by logical reasoning prior to making any actual trials and 
hence it is also known as ‘a Priori’ or theoretical or mathamatical 
probability. 


7 5. Limitations. The classical probability has its short-com- 
Ings and fails in the following situations : 


@) If N, the exhaustive number of outcomes of the random 
experiment is infinite. 


(ii) If the various outcomes of the random experiment are not 
equally likely, For example, if a person jumps from the top of 
Qutab Minar, then the probability of his survival will not be 50%, 
since in this case the two mutually exclusive and exhaustive out. 
comes, viz., survival and death are not equally likely. 


(iii) If the actual value of N is not known. Suppose an urn 
contains some 5alls of two Colours, say red and white, their number 
being unknown. If we actually draw the balls from the urn, then 
we may form some idea about the ratio of red to the white balls in 


such a situation regarding the probability of drawing a white ora 
red ball from the urn. This drawback is overcome, in the statistical 
or empirical probability which we discuss below. 


occurs to the number of trials, as the number 
finitely large, is called the Probability of happening of the event, it 


being assumed that the limit is finite and unique. 


ecomes sufficiently large, it 
5 called the probability of 
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{ Remarks 1. Since in the relative frequency approach, the 
probability is obtained objectively by repetitive empirical observat- 
ions, it is also known as Empirical Probability. 


2. The empirical probability provides validity to the classical 
theory of probability. If an unbiased coin is tossed at random, then 
the classical probability gives the probability of a head as $. Thus, 
if we toss an unbiased coin 20 times, then classical probability sug- 
gests we should have 10 heads. However, in practice, this will not 
generally be true. In fact in 20 throws of a coin, we may get no 
head at all or 1 or 2 heads. However, the empirical probability 
suggests that if a coin is tossed a large number of times, say 500 
times, we should on the average expect 50% heads and 50% tails. 
Thus, the empirical probability approaches the classical probability 
as the number of trials becomes indefinitely larg . 


3. Limitations. It may be remarked that the empirical pro- 
bability P(A) defined in (12-17) can never be obtained in practice 
and we can only attempt at a close estimate of P(A) by making N 
sufficiently large. The following are the limitations of this 
definition : 

(i) The experimental conditions may not remain essentially 
homogeneous and identical in a large number of repititions of the 
experiment. 

(ii) The relative frequency m/N, may not attain a unique value 
no matter however large N may be. 


Example 12.1. A uniform die is thrown at random. Find the 
probability that the number on itis: 


(i) 5, (ii) greater than 4, (iii) even. 


Solution. Since the dice can fall with any one of the faces 
1, 2, 3, 4, 5, and 6, the exhaustive number of cases is 6. 


(i) The number of cases favourable to the event of getting ‘5’ 
is only 1. Hence required probability= 1/6 


(ii) The number of cases favourable to event of getting a 
number greater than 4 is 2, viz., 5 and 6. 


2. Required probability =F 


(iii) Favourable cases for getting an even number are 2, 4 and 
6, i-e., 3 in all. 


2. Required probability=-2 =+ 


Example 122. In a single throw with two uniform dice find the 
probability of throwing (i) Five, (ii) Eight. 


` 
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. Solution. Exhaustive number of cases in a single throw with 


two dice is 6?—36. 


(i) Sum of ‘5’ can be obtained on the two dice in’ the follow- 
ing mutually exclusive ways : 


(1, 4,), (4, 1), (2, 3), (3, 2) ie., 4 cases in all ‘where the first 
and second number in the bracket( ) refer to the numbers on the 
Ist and 2nd dice respectively. 


Required probability 


eg 
36 ^9 


(i) The cases favourable to the event of getting sum of 8 on 


two dice are : 
(2, 6), (6, 2), (3, 5,) (5, 
*. Required probabili 


3), (4, 4) i.e., 5 distinct cases in all. 


5 
ЫС 36: 


Example 12:3, Four cards are drawn at random from a pack of 
52 cards. Find the probability that 


(i) They area king, a queen, a jack and an ace. 


(ii) Two are kings and 
(iii) Allare diamonds. 


two are aces. 


(iv) Two are red and two are black. 
(0) There is one card of each suit. 
(vi) There are two cards of clubs and two cards of diamonds. 


~ . Solution. Four cards can be drawn from a well shuffled 
pack of 52 cards in *2C, ways, which gives the exhaustive number 


of cases, 


(i) 1 king can be drawn out of the 4 Kings is ^C,—4 ways, 
Similarly, 1 queen, 1 jack and an ace can each be drawn in 40,24 
ways. Since any one of the ways of drawing a king can be associa- 


ed with any one of the ways 
he favourable number of cas 


of drawing a queen, a jack and an ace, 
es are ^C, X iC, x 4C, x 6 


1C, KIC. x 4C. у 
Hence required probability— SCC ODORE NAT 


(ii) Required probab 


BC, 
256 


= BC, 


4 4 
ility = TOC at 


(iii) Since 4 cards can be drawn out of 13 cards (since there 


ге 13 cards of diamond in a 


Required Probability с 


pack of cards) in BC, ways, 
BC, 


4 
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(iv) Since there are 26 red cards (of diamonds and hearts) and 
26 black cards (of spades and clubs) in a pack of cards, 


26, 26 
Required probability— ae 
4 


(») Since, in a pack of cards there аге 13 cards of each suit. 


13, 13, 13, 13, < 
Required probability =~ "GX TGC, _ 
4 


Г 
| 


13, 
(vi) Required probability-= ——2^ —3- 


Example 12:4. What is the chance that a non-leap year should 
МУР have fifty-three sundays ? 


Solution. A non-leap year consists of 365 days ie. 52 full 
weeks and one over-day. A non-leap year will consist of 53 sun- 
days if this over-day is sunday. This over-day can be anyone of 
the following possible outcomes : 


(i) Sunday (i) Monday (iii) Tuesday (iv) Wednesday 
€) Thursday (vi) Friday (vii) Saturday ie., 7 outcomes in all. Of 
these, the number of ways favourable to the required event viz., the 


overday being Sunday is 1. 


Hence required probability 7 


xy Example 12°5, A bag contains 20 tickets marked with numbers 

1 to 20. One ticket is drawn at random. Find the probability that it 
will be a multiple of (i) 2 or 5, (ii) 3 or 5. 

: (Bombay Uni. В. Com. Мау 1978) 


Solution. One ticket can be drawn out of 20 tickets in юс, 
=20 ways, which determine the exhaustive number of cases, 


(i) The number of cases favourable to getting the ticket 
number which is : 

(a) a multiple of 2 are 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 i.e., 10 
cases. 


(b) a multiple of 5 are 5, 10, 15, 20 i.e., 4 cases. 


Of these, two cases viz., 10 and 20 are duplicated. Hence the 
number of distinct cases favourable to getting a number which is a 
multiple of 2 or 5 are 10+4—2=12. 


-. Required probability x 3--os 


(ii) The cases favourable to Betting a multiple of 3 are 
3, 6, 9, 12, 15, 18 ie., 6 cases in all and getting a multiple of 5 are 
5, 10, 15, 20 że., 4 in all. Of these, one case viz., 15 is duplicated. 
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Hence the number of distinct cases favourable to getting а multiple 
of 3 or 5 is 6+4—1=9. 


-. Required probability 77 70.45 


Example 12.6. 4 bag contains 4 white, 5 red and 6 green 
balls. Three balls are drawn at random. What is the chance thata 
white, a red and a green ball is drawn ? 


(Meerut Uni. М. Com. 1975 ; Punjab Uni. B. Com. 1974] 


Solution. There are 4+5+6=15 balls in the bag. Three 
balls can be drawn out of 15 іп 15C, ways, Hence, the exhaustive 


number of cases is uc, IX MXIS, 5 х7х13 


П 

One white ball can be drawn out of the 4 white balls in ‘Ci 

ways ; one red ball can be drawn out of the 5 red balls in °C, ways 

and one green ball can be drawn out of the 6 green balls in °C, 

Ways. Since any one of the ways of drawing a white ball can be 

associated with any one of the ways of drawing a red ball anda 
green ball, the required number of favourable cases becomes 


“Сух*С,х*Сү=4х 5х6 


4х5х6 24 
5x7x13 ` 91 


Example 127. An urn contains 8 white und 3 red balls. If two 
balls are drawn at random, find the probability that (i) both are white, 
(ii) both are red, (iii) one is of each colour. 

[Calcutta Uni. B.A. Econ. (Hons.) 1973] 


Hence required probability= 


Solution. Total number of balls in the urn is 84-3—11. Since 
2 balls can be drawn out of 11 balls in "C; ways, 


Exhaustive no. of cases Luc, идо =55 
(i) If both the drawn balls аге white they must be selected out 


of the 8 white balls and this can be done in °C, = 3x1 =28 ways. 


-. Probability that both the balls are white= 23 


(ii) If both the drawn balls are гей, they must be drawn out 
f the 3 red balls and this can be done in *C,=3 ways. Hence the 


probability that both the drawn balls аге тей x 


лн e 


CNCGTUEUGRTUUUTAEM 
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(iii) The number of favourable cases for drawing one white 
ball and one red ball is 


50, X 30,58» 3—24 


С. Probability that one ball is white and other is red= 24 


2 Example 12.8. The letters of the word ‘article’ are arranged 
“ аг random. Find the probability that the vowels may occupy the even 


Places. 


Solution. The word ‘article’ contains 7 distinct letters which 
сап be arranged among themselves in 7! ways. Hence exhaustive 
number of cases is 7 !. 


In the word ‘article’ there are 3 vowels, viz., a,i and e and 
these are to be placed in, three even places, viz., 2nd, 4th and 6th 
place. This can be done in 3!, ways. For each arrangement, the 
remaining 4 consonants can be arranged in 4 ! ways. Hence, asso- 
ciating these two operations, the number of favourable cases for the 
Vowels to occupy even places is 3 ! x 4 ! 

314! 3! 1 


== 


-. Required probability= 71 JX6X5 35 


Example 12.9, The letters of the word ‘failure’ are arranged 
at random. Find the probability that the consonants may occupy only 
odd positions. 


Solution. There are 7 distinct letters in the word ‘failure’ and 


ea can be arranged among themselves in 7 ! ways, which gives the 
e 


xhaustive number of cases. 


In the word ‘failure’ there are 4 vowels viz., a, i, и and e, and 
3 consonants viz., ^ lr. These 3 consonants are to be pa п 
the 4 odd places viz., Ist, 3rd, 5th and 7th and this сап be onein 
^C, ways. Further these 3 consonants can be arranged among ae 
selves in 3! ways and the remaining 4 „vowels can be ane ; 
among themselves in 4! ways. Associating all these opera om р 
total number of favourable cases for the consonants to occupy only 
odd positions is $C, 3! x4! 


Hence required probability 


4С,х31 x4! 4x3! TEA 
ДЕЕ! 756589 35 


irs at a round 
Example 1210. п persons are seated on n chairs at a 
table. Find the probability that two specified persons are sitting next 


so Sach о [Delhi Uni. B. A. (Econ. Hons.) 1982] 


, 
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Solution. Let us Suppose that of the п persons, two persons, say, 
Aand B are to be seated together at a round table. After one of 
these two Persons, say, A occupies the chair, the other person B can 
occupy any one of the remaining (n—1) chairs. Out of these (n—1) 
Seats, the number of seats favourable to making Bsit next to A is 
2 (since В can sit on either side of A). Hence the required probabi- 
lity is 2/(n— 1). 

Example 12.11. There are four hotels in a certain town. If 3 men 
check into hotels ina day, what is the probability that each checks 
into a different hotel ? [C.A. (Intermediate), November 1983] 


Solution. Since each man can check into апу one of the four 
hotels in *C,—4 ways, the 3 men can check into 4 hotels in 
4x4x4=64 Ways, which gives the exhaustive number of cases. 

If three men are to check into different hotels, then first man 
can check into any one of the 4 hotels in tCı=4 ways ; the second 
man can check into any one of the remaining 3 hotels in 30,3 
Ways ; and the third man can check into any one of the remaining 
two hotels in ?C,—2 ways. Hence, favourable number of cases for 
each man checking into a different hotel is : 

1C X90, X *0,—54x 3x 2—24 
^o Required probability 24 -4 =0.375 


EXERCISE 12.1 
l. Explain the concept of probability following : 
G) Mathematical or ‘a Prior’ approach, 
(ii) Relative frequency or Empirical approach, 
2. (a) Define random experiment, trial and event. 


а (5). What do you understand by (i) equally likely (èi) mutually exclu- 
sive and (iii) independent events, 


(c) Define independent and mutually exclusive events. Can two events 


be mutually exclusive and independent simultaneously ? Support your answer 
with an example, [Delhi Uni. M.B.A. 1973] 


3. (a) Discuss the different schools of thought on the interpretation of 
Probability, How does each school define Probability ? 


a (b) Describe briefly the various schools of thought on probability. 
Discuss their limitations, if any. [Delhi Uni. M.B.A. 1976] 


4. (а) In a Single throw of two dice what is the probability of getting 
(0) a total of 8 ; and (ii) Total different from 8 : 


Ans. (i) 5/36, (ii) 31/36. 
(b) Prove that in a single throw with a pair of dice the probability of. 
the sum 


getting of 7 is equal to 1/6 and the probability of Betting the sum of 
10 is equal to 1/12. 


5. Ina single throw of two dice, find 
(i) P (odd number on first dice and 6 on the second), 
(i) P (a number > 4 on each die), 
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Uii) P (a total of 11), 

(iv) P (a total of 9 or 11), 

(v) P (a total greater than 8). 

E 

18 

N 6. Tickets are numbered from 1 to 100. They are well shuffled anda 

ticket is drawn at random. What is the probability that the drawn ticket has ; 
(a) an even number ? 
(6) a number 5 or a multiple of 5 ? 


4m. @) up. (ii) 3- (0 ipo 0 


(c) a number which is greater than 75 ? 
(d) a number which is a square ? 

[Himachal Pradesh Uni. M.B.A. 1979] 
Ans. (a) 05, (b) 0-2, (c) 025, (d) 0-10. 


7. There are 17 balis, numbered from 1 to 17 іпа bag. Ifa person 
Selects one ball at random what is the probability that the number printed on 


the ball will be an even number greater than 9 ? 
[C- A. (Intermediate), Nov. 1985] 


Ans. 4/17. 


8. An integer is chosen at random from the first 200 positive integers. 
What is the probability that integer chosen is divisible by 6 or 8? 


Ans, 1/4. 

9. Опе ticket is drawn at random from a bag containing 30 tickets 
numbered from 1 to 30. Find the probability that 

(i) It is a multiple of 5 or 7. 

(ii) It is a multiple of 3 or 5. 

Ans. (i) 1/3, — (ii) 7/15. 

10. A number is chosen from each of the two sets : 

1, 2, 3, 4, 5, 6, 7, 8, 9; 1,2,3,4,5,6, 7, 8; 9. 

If p; is the probability that the sum of the two numbers be 10 and p; the 
Prebability that their sum be 8, find p:c- ps. 

Ans. 16/18 


Я 11. А bag contains 7 white and 9 black e un are grawah 
uccession at random. What is the probability that one of them is wi 
the other is black т" Sa 5 x [Madras Uni. M.B.A. 1976] 


Ans. 21/40 
12. A bag contains eight balls, five being red and three white. Ifa man 
Selects two balls at КАА: from the bag, what is the probability that he will 
get опе ball of each colour ? 
5 
Ans, 6 is XO 
2 


13. A bag contains 4 white, 5 red and 6 green balls. Three balls are 
drawn at andom What is the probability that a white, a red and a green ball 
are drawn ? 


Ans. 24/91 
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14. A bag contains 7 red balls and 5 white balls, 4ballsare drawn at 
random. What is the probability that (i) all of them are гей; (ii) two of them 
are red and two white ? 

Ans. (i) 7/33, (ії) 14/33 


15. А bag contains 8 black, 3 гей and 9 white balls. If 3 balls ara 
drawn at random, find the probability that (а) all are black, (5) 2 are black and 
1 із white, (c) 1 is of each colour, and @) the balls are drawn in the order 
black, red and white. (e) None is red. 


1 18 3 34 
Ans. (а) ae ® A © 3 @ эу, O A. 


16. А bag contains 10 white, 6 red, 4 black and 7 blue balls ;5 balls 
R ШЗ atrandom. What is the probability that 2 of them are red ; and one 
lac 


Ans. *C,x C, x vC," C, 


17. The Federal Match Company has forty female employees and sixty 
male employees. If two employees are selected at random, what is the probabi- 


lity that 
(i) both will be males, (ii) both will be females, 
(ii) there will be one of each sex? 


, Since the three events are collectively exhaustive and mutually exclusive, 
what is the sum of the three probabilities ? n 
[Punjab Uni. B.Com., April 1978) 


Ans. (i) "357, (ii) 7157, (iii) :4848 


, 18. Ifa single draw is made from a pack of 52 cards, what is the proba- 
bility of Securing either an ace of spades or a jack of clubs? 
(Allahabad Uni. M.Com. 1970) 
Ans. 1/26 


Р 19. Four cards are drawn from а full Pack of cards. Find the probabi- 
lity that two are spades and two are hearts ? 
(Bombay Uni. B.Com., Nov. 1974) 
UC.XxUC, 468 
din» sC, — 20825 


* 20. Four cards are drawn without replacement. What is the probabi- 
lity that they are all aces ? 
1 
Ans. Eom 
21. From a pack of 52 cards 4 are accidentally dropped. Find the chance 
that (i) they will consist of a knave, a queen, a king and ace. (ii) they are 


the 4 honours of the same suit, (fii) they be one from each suit (iv) two of them 
are red and two are black, 


52, 


coq "c. 


25 á 2 
An. Oo (9 тр. (ш) COL, yy "хто, 


22. The letters of the word Triangle are arranged at random. Find 
the probability that the word so formed. 
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? 26. Twelve balls are distributed at random among three boxes. What 
is the probability that the first box will contain 3 balls ? 


NC. x 2° 
j 

() starts with 7, (ii) ends with E, 

dii) starts with T and ends with E. 


Ans. (i) T Gi) T (iii) 4 


23. In arandom arrangement of the letters of the word VIOLENT, 
find the chance that the vowels J, O, E occupy odd positions only, 
1G$x314!1 4. 
71 35 
24. Ina random arrangement of the letters of the word Allahabad, find 
the chance that the vowels occupy the even places. 
1x51207 79 p 2 Ng 
21 "4121 126 
25. The letters of the word ARRANGE are arranged at random. Find 
the chance that : 
(i) The two R's come together. 
(ii) The two R's do not come together. 
(їй) The two R's and the two A's come together. 


Ans. 


Ans. 


! 
Ans. (i) d БЕ =360-+1260— + 


2 5 SEM 
(ii) (1260-360) —-1260— 7-, (iii) 260 ` 21 
12.7. Axiomatic Probability. The modern theory of pro- 

bability is based on the axiomatic approach introduced by the 
Russian mathematician A.N. Kolmogorov in 1930's. Kolmogorov 

` axiomised the theory of probability and his small book Foundations 
of Probability,’ published in 1933, introduces probability as a set 
function and is considered as a classic. In axiomatic approach, to 
start with some concepts are laid down and certain properties ОГ 
postulates, commonly known as axioms, are defined and from these 
axioms alone the entire theory is developed by logic of deduction. 
The axiomatic definition of probability includes both the classical 
and empirical definitions of probability and at. the same time is free 
from their drawbacks. Before giving axiomatic definition of pro- 
bability, we shall explain certain concepts, used therein. 


Sample Space. The set of all possible outcomes of a random 
experiment is known as the sample space and is denoted by S. In 
other words, sample space is the set of all exhaustive cases of the 
random experiment. The outcomes of the experiment are also 
known as sample points. Mathematically, if е, e2, €n are the 
mutually exclusive possible outcomes of a random experiment, then 
the set S—(e;, 65,-.«, e») is said to be sample space of the experi- 
ment. The elements of S possess the following properties : 


(i) Each of the e/s (ї=1,2,..., п) is an outcome of the 
experiment. 
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(ii) Any repetition of the experiment results in an outcome 
corresponding to one and only one of the ers. 


Remark. We shall write n(S) to denote the number of ele- 
ments i.e., sample points in S. 


Illustrations. 1. If a coin is tossed at random, the sample 
space is S=(H, Т) and n(S)=2. 
If two coins are tossed then the sample space is given Ьу: 
S={(H, T)x(H, T)) 
—(HH,HT, TH, TT) 
and n(S)=2. 


In a toss of three coins, 
S={(H, T)X (H, T)X (H, Туу 
={(HH, HT, TH, TT)x(H, T) 
р ={ 
and n(S)=8 


2. If two dice are thrown, then the sample space consists of 
36 points as given below : 


Га, 1), (1, 2), (1, 3, (1, 4), (1, 5), (1,6) 1 
eb 1), @ 2), (2, 3), (2, 4). (2, 5), 13,6) |, 
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6,6) J 


‘Event. Of all the possible outcomes їп the sample space ofa 
random experiment, some outcomes satisfy a specified description, 
which we call an event, For example, as already discussed, in a toss 


of 3 coins the sample space is given by ; 
S={HHH, HTH, THH, TTH, HHT, HTT THT, TTT} 
= {Wis Was Was Wa, Wes Wes Wo, We}, Say (12.18) 
where wi=HHH, w,—HTH, Ws—THH, ..., ws=TTT. 


For this sample space we can define a number of events, some 
of which are given below : 


` E, : Event of getting all heads 

={HHH} ={w,} 

E, : Event of getting exactly two heads 
={HTH, THH, TTH)—(w,, ws, wj) 

Е, : Event of getting at least two heads 
= Ws We, Wa, ws) 
={w,} U {we Ws, Ws} 
=Е, U E, е) 
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where Е, and Е, are disjoint. 
E, : Event of getting exactly one head 
={Wa, We, Wa} 
E, : Event of getting at least one head 
={Wy, Way Wa; Was Ws» Wes Wo) 
={Wy, Wa, Wa Ws} U {Wa Wo w} 
—E,U E, 
=É, ОЕ, U Ep [From (*)] 
where E, E; and E, are disjoint. 
Е,: Event of getting all tails 
={ТТТ}= {ws}. 
Thus, rigorously speaking an event may be defined as a non- 
empty sub-set of the sample space. Every event may be expressed as 
a disjoint union of the single element subsets of S or a disjoint 


union of some subsets of S. Since events are nothing but sets, the 
algebra of sets may be used to deal with them. 


The two events A and Bare said to be disjoint or mutually 
exclusive if they cannot happen simultaneously i.e., if their intersec- 
tion isa null set. Thus if 4 and B are disjoint events, then 


AN B=$ =>  P(An B)-P()-0 (12:19) 
Thus P(A N B)=0, provides us with a criterion for finding if A 
and B are mutually exclusive. 


Axiomatic Probability (Definition). Given a sample space 
ofa random experiment, the probability of the occurrence of any 
event 4 is defined as a set function P(A) satisfying the following 


axioms, 
Axiom 1. P(A) is defined, is real and non-negative i.e., 


P(A) >0 
Axiom 2. P(S)=1 ... (12.20) 
Axiom 3. 1f 4, 45... An is any finite or infinite sequence of 
disjoint events of S, then 


n eo 

n 

PU 4)- M PAD ог рол) >, PA 0220 

i=l i=] 
i=l i=l 

The above axioms are known as axioms of positiveness, Cer- 


tainty and unity respectively. 
Events as Sets— Glossary of Probability Terms. 
If A and B are two events then: 
AUB.: An event which represents the happening of at least one of 
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the events A and B, (i.e., either A occurs or B occurs or 
both A and B occur). 


АПВ: An event which represents the simultaneous happening of 
both the events 4 and B. 


A: A does not happen. 
ANB: Neither А nor B happens i.e., none of A and B happens. 
ANB + A does not happen but В happens. 


(AN B)U(4NB) : Exactly one of the two events A and B happens. 


The above notations can be generalised forn events, say, 4, 
Az, Ап. Thus: 


ANAN eNA: A compound event which represents the simul- 
taneous happening of all the events Ay, Agen, 
An. 


AUAU... UAn : An event which represents the happening | of at 
least one of the events 4, А3,..., An. This in- 


Volves the events of the type Aj, А›,...‚ An (one 
at a time) ; 


Ai Aj, (iAj=1, 2,..:, n) ie., simultaneous hap- 
pening of two at a time 5 


Ai Ai Ах, (iAjAk=1, 2,..., n), i.e., simultane- 
ous happening of three at a time,..., and 


АПА, Г\... An i.e., all the nat a time. How- 
ever, if Ay, Ag,..-, An are mutually disjoint, they 


can’t happen simultaneously, Le, АПА, AiN AIN Ax, 
si ADAN (N An are all null events and in that case 
41UA5U... U An will represent the happening of any 
one of the events ‘Ay; А... Ав. 

Probability —Mathematical Notion. 
the sample space of a random experiment with 


the elements of 5 and 
а О a given event often 
nple, if a die is thrown three times, 

then total number of sample Points would be 65—216 and if 3 cards 
without Teplacement there would 

be 32x 51 x 50=132,600 sample points, О write them is a very 
ary. However, in such 
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situations the computation of probabilities can be facilitated to a 
great extent by the two fundamental theorems of probability—the 
addition theorem and the multiplication theorem discussed below. 


12.8. Addition Theorem of Probability 


Theorem 129. The probability of occurrence of at least one of 
the two events A and В is given Ьу: 


P(AUB)=P(A)+-P(B)—P(AN B) (12.22) 


Proof. Let us suppose thata random experiment results in a 
sample space S with № sample points (exhaustive number of cases). 
Then by definition : 


P(AUB)= LR AUD. 2.01223) 


where n(AU B) is the number of occurrences (sample points) favour- 
able to the event (AUB). 


^ B 


Fig. 12:5 . 
From the above diagram, we get : 
= B)—n(ANB 
paus- mA BN ERAN EY FU n(AQ B) 


__ n(A)-+n(B)—nl ANB) 
N 

n(A) n(B) т(АСВ) 

уа d 

= P(A)+P(B)—P(ANB) 


12.8.1. Addition Law of Probability for Mutually Exclusive 
Events. If the events A and B are mutually disjoint, i.e., if ANB=¢ 
then 
n(ANB) n) _ NC 

a, mA 0, © 


because n(¢)=0, as a null set does not contain any sample point. 
In case of disjoint events, AUB represents the happening of anyone 
of the events А and B. Hence, substituting from (*) in (12.22) we get 
the addition theorem as follows : 


P(AOB)— 
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Theorem 12.10. The Probability of happening of any one of the 
two mutually disjoint events is equal to the sum of their individual 
probabilities. Symbolicall Y, for disjoint events A and B, 

P(AUB)—P(A)4- P(B) *«. (12.24) 

12.8.2. Generalisation of (12.22. For three events 4, B 
and C, the Probability of the occurrence of at least one of them is 
given by 

P(AUBUC) =P(A)+P(B)+P(C)—P(AN B)—P(BNC) 


—P(ANC)+P(AN BNC) 012.25) 
In particular if A, B and С are mutually exclusive (disjoint) 
: AN B=ANC=BNC=¢ and AN BNC=¢ 
AN B)=n(4NC)=n(BNC)=n(AN BAC) =0 
Hence Substituting in (12.25), 


any one of the mutually exclusive ev 
Sum of their individual probabilities giy 


the probability of occurrence of 
ents A, B and C is equal to the 
еп by: 


P(AUBUC)=P(4)+P(B)-+-P(C) --. (12.26) 
In general, if Aj, 4,,..., An are mutually exclusive then 
P(4,U AU... UA») —P(4;)- P(4;) +... + P(4n) -(12.27) 


d.e., the probability of occurrence of any one of the n mutually disjoint 
events A,, 4,.-.,Ап із equal to the sum of their individual probabilities, 


Important Remark. How to use 
in Numerical Proble. 


bility of occurrence of an event А. T 


mutually exclusive forms of 4 be Ay, Ao An, 


ASA UA UAU. UAn 


where 241, Ay, ...‚ Anare mutually exclusive. Hence using (12.27) 
we get 


P(4)=P(4,U 4, U.. UAn) 


5P(4)+P(4,)+.-.+P(4n) 
Hence the working rule for numerical problems may be sum- 
marised as follows : 


“The Probability of occurrence of any event A is the sum of the 
probabilities of happening of its all possible mutually exclusive forms 
An Аз, An”, 


12.9. Theorem of Compound Probability ог Multiplication 
Law of Probability 


he nude 


ae M 


" 


cadi, Га 
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Theorem 1211. The probability of simultaneous happening 
of two events A and B is given by : с 


P(AN B)=P(A). Р(В/ А); P(4)£0 
or Р(Вг\А)=Р(В). P(4/B); Р(В)-=0 - (12.28) 


where P(B/A) is the conditional probability of happening of B under 
the condition that A has happened and P (АЈВ) is the conditional pro- 
bability of happening of A under the condition that B has happened. 


Proof. Let A and B be the events associated with the sample 
space Sof a random experiment with exhaustive number of out- 
comes (sampte points) N, i.e., n(S) = М. Then by definition : 


n(AMB) 

ЕЕ ...(12.29) 
For the conditional event А/В (i.e., the happening of A under 

the condition that B has happened), the favourable outcomes 

(sample points) must be out of the sample points of B. In other 

words, for the event А/В, the sample space is В and hence 


P(AN B)= 


P(4/B)= qan A ...(12.30) 
Similarly, we have ` 
rg 4) 200. (12.31) 


Rewriting (12°29) we get : 
n(4)., n(AMB) 


PAN B=) ху 
—P(A). P(B/A) [From (12.31)] 
Also _ n(B) .| n(ANB) 
P(AN В) = nS) x BIS. 
=P(B). P(A/B) [From (12.30)] 


Remark. Multiplicative Law for Independent Events. If 
A and Bare independent so that the probability of occurrence or 
non-occurrence of A is not affected by the occurrence or non-occur- 
rence of B, we have 


Р(4/В)= A and P(B/A)=P(B) «=. (12,32) 
Hence substituting in (12:29) we get : 
Р(Аг\В)=Р(А).Р(В) ..(12.33) 
Hence the probability of simultaneous happening of two indepen- 
dent events is equal to the product of their individual probabilities. 
Generalisation. The multiplication law of ptobability can 


be extended to more than two events. Thus, for three events 4;, 
4, and A, we have 
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FAO 4,0 45) - P(A)PCL 4) P(4,] 40 44) +++(12.34) 
For n events 4, 4,,..., An we have : 
PA Aa exe Ao) — PCAP(4,/4,) P(4)/ 4,01 A.) 
X XPGs[ A410) 4,0...) Аһ_1)...(12.34а) 
In Particular, if Aj, Ansas An are independent events then 
PNAN... N An) =P(4,)P(Ay) oP An) --. (12.35) 


ñe., the Probability of the simultaneous happening ofn independent 
events is equal to the Product of their individual probabilities, 


We shall now give Some results in the form of theorems, 
which will be frequently used in the solution of numerical problems 


Theorem 12.12. P( 4)—1—P(A) ++(12.36) 


Theorem 12.13. G) P(4 à B)—P(B)—P(A N B) ...(12.37) 
(i) P(AN Б )=РА)—Р(А N В) (12.38) 


a УВ N know that for every event E, P(E) 20. Hence 
P(A  B)& P(B) ...(12.39) 
Similarly from (12.38) we get 
PANBS P(A) +++(12.40) 
Theorem 12.14. If ACB, then P(4) < P(B) ++-(12.41) 


Remark. The results in (12.39) and (12.40) can 
deduced from 12.41), since A N В CAandAnB 


Theorem 12.15. If events A and B are independent then the 


complementary events 4 and B are also independent, 


Remark. In f; P 
Bare independent mon motio Mavo the following results, If A and 


be immediately 
Cup: 


(a) A and F. are independent, 
(b) Jand Bare independent, 


Theorem. 12.16. TEA As. Ав are independent events with 
respective probabilities of occurrence Pis Pay ..., Pa then the Probability 
of occurrence of at least one of them is given by 


P(4,UA; UU An 1—(1—p,)(1—p,).:.(1—pa) (12-42) 
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Proof. We are given : 


P(4)=pi > P(4, )=1—p: (i) 
We know that for any event E, 
P(E)+P(E )=1 (ii) 


Taking P= AU A;U.--U Za in (ii) we get 
Р(А, UA;U.-- Un) -P(A,UASU---U Ая)в=1 

= P(A UAU. UAn)+ P(A А, П.П Да) = (0243) 

[By De-Morgan’s law of complementation, i.e., the complement 
of the union of sets is equal to the intersection of their complements]. 
> — P(AUAU---U4s)-—1—P(CA N А, O-- 4s) (12.44) 

‚ =1—P(A) Р(Я„)..Р(СА„), 

by compound probability theorem, since, А}, А„,...‚ Ап and conse- 
quently A1, 4; ,--.,4, are independent [c.f. Theorem 12.15]. Hence 
substituting from (i) we get : 

P(4;U A5U... Un) -1—(1—p;)(1—23)--.(1— pn) 


Remark. The results in (12.43) and (12.44) are very impor- 


tant and are used quite often in numerical 
ideals 2 | 1 merical problems. Result (12.43) 


P {Happening of at least one of the events 4i, 43,..., An} 


=1—P{None of the events Ау, A,,..-An’ E 
or Equivalently, 1» Ap,--.An/ happens} (12.45) 


P {None of the given events happens} 
—1—P {At least one of them happens} +++(12:45a) 


We shall now discuss numerical problems, explaining the use 
of addition and multiplication theorems of probability: Eus 


Example 12.12. Let E denote the experiment of tossing a 
coin three times in succession. Construct the sample space S. Write 
down the elements of the two events E, and E, where E, is the 
event that the number of heads exceeds the number of tails and E, 
is the event of getting head in the first trial. Find the probabilities 
P(E) and P(E,) assuming that all the elements of S. are equally 
likely to occur U.C.W.4. (Final), June 1984] 


Solution. The sample space S in a random experiment of 
*'tossing a coin three times in succession”, is given Ьу: (H— Head ; 
T=Tail) 

S={H, T}x (H, T) x {H, T) 

—(H, T) x (HH, HT, TH, TT} 
—(HHH, HHT, HTH, HTT, THH, THT, TTH, TIT) 


The number of elements in the sample space, i.e., the exhaustive 
number of cases is given by n(S)=8. 
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The event Ey: “Number of heads exceeds the number of 
tails” in a random toss of 3 coins means we should get at least 
two heads, Le., two heads and one tail > Or all three heads. Thus 


the sample points of E, are 
Еү={ННН, HHT, HTH, THH} 
and п(Е,)=4. 


Similarly, the event E: "Getting head in the first trial’? 18 
given Бу ; 


Ез={ЫНН, HHT, HTH, HTT} 
and п(Е,)=4. 


If we assume that all the elements of 5 are equally likely to 
occur then 


wn AE) 4 
Р(Е,)= nS) EY: 


zd 
ND 


and PE) 1081. =4 


I 
ю— 


accountant. Find the Probability of forming the committee in the 
following manner : 

(i) ` There must be one from each category. 

(ii) It should have at least on 


) € from the purchase deptt. 
(її) The chartered accountant 


must be in the committee, 
[C.A. (intermediate), May 1983) 


Solution. There are in all 3+4424]= 
mittee of 4 can © formed out of these 10 
€nce the exhaustive number of Cases is : 


ug, 10x зах 7 


10 people. А сот- 
People in "C, ways. 
=210 


(i) The number of favourable Cases for the committee to 
Consist of one member from each Category (Production, Purchase, 
Sales & C.A.) is: 


ХАСХА X IC 3x 4x 2x14 


“+ Required probability 3 КА E —0.1143 


тү TP ne 


TAM ENIM арЫ ee T T PR 


i 
: 
| 
| 
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- P[3 from purchase deptt. and 1 other] 
+P[4 from purchase deptt.] 

CIX Ca EOS Co GIOCO 
= WC, 0С, + юс. T nC 


6х5х4 4x3 6х5 
31 21 * o 


+4x641 | 


jag c 
=- 80+90+24+1 )- Jig -09286 


(iii) The probability p, that the chartered accountant must be 
in the committee of 4 is given by : 


Pi=P [Chartered Accountant and 3 others] 


|OXQX C,  9х8х7 4! 

но а Spare" T0 DX ROCT 
4 : 

= 10-=04 


(ii) Aliter. 
Required probability is given by: 
p=1—P [There is no person from purchase deptt.] 


*C, 


=i 


no 


___ [Because all the four persons must be selected from produc- 
tion and sales deptts. and С.А.] 

6х5х4х3 

10х9х8х7 


"Example 12.14. A committee of four has to be formed from 
among 3 economists, 4 engineers, 2 statisticians and 1 doctor. 
(i) What is the probability that each of the four professions is 
represented on the committee ? 
(ii) What is the probability that the committee consists of the 
doctor and at least one economist ? 
[Delhi Uni. B.A. (Econ. Hons. I), 1983] 
Solution. There are 3+4+2+1=10 members in all anda 
committee of 4 out of them can be formed in °C, ways. Hence 
exhaustive number of cases is : 
0¢,= DL =210 
(i) Favourable number of cases for the committee to consist 
of members, one of each profession is : 


*C,X 4C, x °C, x 1=3X4x2=24 
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бе ЕЕ 

210 ~ 35 


(ii) The probability ‘p’ that the committee consists of the 
do ctor and at least one economist is given by 


р=Р [One doctor, one economist, 2 others ] 
+P [One doctor, two economists, 1 other ] 
+P [One doctor, 3 economists] 


<- . Required probability= 


__1С,х°С,х*С, di 1C x °C, °C, + 1C, X 3C; 
C TREE T Y yat Qs C, 3 nC, 10C, 
6x5 


1 
=pl i» SE 41x 3X641x1 ] 


1 С И | 
= 10 (4541841 )- 35 = -9 —0.3048 


WL Example 12.15. A card is drawn from a well shuffled pack 
of playing cards, Find the probability that it is either a diamond or 
a king. 


Solution. Let A denote the event of drawing a diamond and l 


B denote the event of drawing a king from a pack of cards. Then 
we have 


13 1 4 1 
Р(А)= 557 E and P(B)— > = dà | 
and we want P(AU B). 
Now Р(А\)В)=Р(А)-ЕР(В)—Р(Аг\В) ў 
1 15: | 
ета FUA QUE) Ы) 
There is only one case favourable to the event AN B viz., king ў 


of diamond. Hence P(AN B=) > 
Substituting in (*), we get 


PAUB)- T ——- 


52 
‚ы уа 16 .4 
52 52:513 


5/ Ехашріе12.16. Let A and B 


be the two possible outcomes of | 
an experiment and suppose 


Я 
P(4)=0.4,  P(AUB)—07 ай Р(В)=р 
(i) For what choice of p are A and В mutually exclusive ? 
(ii) For what choice of. р are A and В independent ? 


Theory of Probability 705 


Solution (i) We have 
P(AUB)=P(4)+P(B)—P(ANB) 
> P(ANB)=P(A)+P(B) -P(AUB) 
—0.44-p —0.7 
=p—0.3 
1f A and В are mutually exclusive, then 
P(ANB)=0 = p—03=0 > р=0.3 
(ii) A and B are independent if 
P(ANB)=P(A). P(B) 
> p—0.3=(0.4) Xp 


> (1-0.4)p=0:3 = 0.6p=03 > р=-02-=05 


v Example 12.37. Ina certain college, the students engage in 
various sports in the following proportions : 
Football (F) : 60% of all students 
Basketball (В): 50% ” » 
Both football and basketball : 30% of all students. 
If a student is selected at random, what is the probability that 
he will : 
(i) play football or basketball ? 
(ii) play neither sports ? 
[Delhi Uni. B.A. (Econ. Hons. I), 1983] 


Solution. Let A denote the event that the student is engaged 
in Football and B denote the event that he is engaged in Basketball. 
Then we are given : 


P(A)=0-60, Р(В)=0.50 , Р(Аг\ В)=0.30 
(i) The probability that a student selected at random plays 
football or basketball is given by : 
P(AUB)=P(A)+P(U)—P(ANB) 
=0.60+ 0-50 - 0.30=0.80 
(ii) The probability that the student plays neither football nor 
basketball is given by : 
P(4 NB )=1—P[He plays at least one of the two 
games] 
=1—0-80=0.20 
Aliter. We have P(AM B)=0.30=P(A). P(B) 
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> А and В are independent and hence 
A and B are also independent 
P(A NB )-P(4 ). P(B)-[1 —P(4)] LL —P(G3)] 
—(1—0.6)(1—0.5)—0.4x 0.5 0.20 
Example 12.18. А Chartered Accountant applies for a job in 
two firms X and Y. He estimates that the probability of his being 
selected in firm X is 0.7, and being rejected at Y is 0-5 and he pro- 
bability of at least.one of his и being pa is 04 . What 
i ility that he will be selected in one of the firms ? 
онаи [Panjab Uni. В. Com., Sept. 1980] 


Solution. Let A and B denote the events that the chartered 
accountant is selected in firms Х and Y respectively. Then in the 
usual notations, we are given : 


P(A)=0.7 > P(4)=1—0.7=0.3 
P(B)=0-5 > P(B)=1—0.5=0.5 
and P(4 Ш B)=0-6 
We know (By De-Morgan's law) 
ANB=A UB 
P(408)-1—P(4B)-1—P(4 U BY) 
- P(4018)—1—0.6—0.4 


The probability that the chartered accountant will be sclected 
in one of the two firms X or Y is given by : 


P(AUB)=P(A4)+P(B)—P(ANB) 
—0.7--0.5—0.4—0.8 
Example 12.19. Probability that a man will be alive 25 years 
hence is 0.3 and the probability that his wife will be alive 25 years 
hence is 0.4. Find the probability that 25 years hence (i) both will be 


alive, (ii) only the man will be alive, (iii) only the woman will be 
alive, (iv) at least one of them will be alive. 


(Bombay Uni. B. Com., Nov. 1980) 
Solution. Let us define the following events : 
A: The man will be alive 25 years hence, 
В: His wife will be alive 25 years hence, 
We are given P(4)—0.3 and P(B)—0.4. 


.. . G) The probability that 25 years hence both man and his wife 
will be alive is 


P(AQB)—P(A) . P(B) — (.. A and B are independent) 
=0.3 x 0.4=0:12 


Theory of Probability 707 


(ii) The probability that 25 years hence only the man will be 
alive is 
P( AN B)=P(A) . P(B) - P(A)(1 — PCB)] 
—0.3x (1—0.4)—0.3 x 0.6—0.18 


(iii) The probability that only the woman will be alive 25 
years hence is 


(408) -PCA) Xx P(B) -LI —P(A)] x P(B) 
—(1—0.3) x 0.4—0.7x 0.4—0.28 


(iv) The probability ‘p’ that 25 years hence at least one of 
them will be alive is 
p=1—P(None will be alive) 
=1—p(AnB)=1—P(4) xP( 3) 
=1—(1—0.3)x (1—0.4) 
=1—0.7x 0.6=1—0.42=0.58 
Aliter. Required probability is : 
P(AUB)=P(A)+ P(B)—P(ANB) 
=P(A)+P(B)—P(A). P(B) 
=0.3+0.4—0.3 x 0.4 
=0.70—0.12=0.58 
Example 12.20. The probability that India wins a cricket test 


match against England is given to be +. If India and England play 
three test matches, what is the probability that : 


(i) India will lose all the three test matches ? 


(ii) India will win at least one test match ? 
[Delhi Uni., M.A. (Bus. Econ.), 1977) 


Solution. Let Ау, А and A, denote the events that India 
wins the first, second and third test match against England respec- 


tively. Then Ay, A, and A, represent the complementary events 
that India loses the Ist, 2nd and 3rd test match respectively. We are 


given: P(A) =PUA)= PA) = 
= P=) 
(i) P [India will lose all the three matches] 
=P(4.N AM А) 
—P(A). PA) . P( 2), 


[By compound probability theorem, since the events A, do, 4; and 
consequently A, , А, , Аз are independent of each other.] 
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x gag CANTI CAST 
(ii) P [India will win at least one test match] 
=Р(А, UA,UA,) 
=1—P(India will lose all the three matches] 
19 : 
=1— xz» [From Part (i)] 


Example 12.21. А problem in Statistics is given to three 
students A, B and C whose chances of solving it are 4, + and 1 res- 
pectively. Find the probability that the problem will be solved. 


(Shivaji Uni. B. Com. 1978) 
Solution. Let E, E, and E, denote the events that the 
problem is solved by A, B and C respectively, Then we have 
Р(Е)=} > P(É)-1—PR(E)—$ 
P(E,)=t = P(E,)=1—P(E,)=} 
P(E)-i = P(E,)=1—P(E,)=$ 
Problem will be solved if at least one of the three is able to 
solve it. Hence the required probability that the problem is solved 
is given by : 
P(E, U E, U £,)=1—P(E, N E, Nn E) 
=1 —P(E,).P(E,).P(E,) 


[Ву compound probability theorem 
since E,, E, and E, are independent) 


So 
me х4 X 

3 
-1-—- AT 


Example 12.22. Find the probability of throwing 6 at least 
once in six throws with a single die. 


[Kurukshetra Uni. B.Com. Sept. 1975 ; 
Calcutta Uni. В.А. Econ. (Hons.) 1976] 


Solution. Let E: (i=1, 2,...,6) denote the event of getting a 
$ in the ith throw of a single die. Then 


PE) 1- > PE)= 5 ; (—1,2,...,6) 


The probability that in six throws of a single die we get 6 at 
least once is given by: 
P(E,UE UE, UE,UE,UE,) =1—P(E, n E, E,n E,nE,n Eg 


=1—P(E,).P(E,).P(E,).P(E,). 
“PE, DE) 
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[." Ey, Es... Es and consequently E,, E,,..., E, are independent, 
since the throws of the die are independent]. 


5 6 
--($) 
Example 12.23. The odds that A speaks the truth are 3 : 2 and 
the odds that В speaks the truth are 5:3. In what percentage of 


cases are they likely to contradict each other on an identical point ? 
(Diploma in Management, A.I. M.A., 1977 


Solution, Let us define the events : 
E, : A speaks the truth 
E, : B speaks the truth. 
Then E, and E, represent the complementary events that A 
and B tell a lie respectively. We are given : 
Р(Е)=# > Р(Е)=1—4=# 
апі Р(Е)=& =  P(É)-1-i-i 
The event £ that A and В contradict each other on ап identi- 
cal point can happen in the following mutually exclusive ways : 
(i) A speaks the truth and B tells a lie je., the event 
E,N Ë, happens. 
(ii) A tells a lie and B speaks the truth i.e., the event E,NE, 
happens. 
Hence by addition theorem of probability : 
P(E) = PG) + PG) = P(E, Ёз) +- P(E, 22) 
=P(E,). P(E,)+P(E,). Р(Е,), 
by compound probability theorem, since E, and Е, are independent, 
5 9+10 19 


. UL SP ELLY SM ч URL 
п Р) хх 407 = Gy 0-475 


Hence A and В contradict each other on an identical point in 
47:595 of the cases. 


Example 12.24. Three groups of children contain respectively 
3 girls and 1 boy, 2 girls and 2 boys, 1 girl and 3 boys. One child is 
selected at random from each group. Show that the chance that the 
three selected consist of 1 girl 2 boys is 13/32. 


Solution. Let B,, В, and B, be the events of drawing a boy 
from the Ist, 2nd and third group respectively and G;, G, and G, be 
the events of drawing a girl from the Ist, 2nd, and 3rd group re- 
pectively. then 

P(B)-i Р(В)=4  P(B)—i 
and P(G)-i  P(G)-i P(Gs)=t. 
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Solution. Let us define the following events : 

A : The house hold owns a TV. 

B : The household is a telephone subscriber. 

C : The household has annual income over Rs. 8,000. 

The required event of getting 1 girl and 2boysin a random 
selection of 3 children can materialise in the following mutually 
exclusive cases : 

(i) Girl from the first group and boys from the 2nd and 3rd 
groups i.e,, the event Gi1N LNBs happens. 

(И) Girl from the 2nd group and boys from the Ist and 3rd 
group i.e., the event B,01G,0 B, happens. 


(iii) Girl from the third group and boys from the 1st and 2nd 
groups i.e., the event B, NBN G; happens. 


` Hence by the addition theorem of probability, required pro- 
bability is P(i)- P()-- Pii). 


=P(G,N B, B,)--P(B, G4 Bs) + PG, Bo Gs) 
=P(G,)P(B,)P(Bs)+P(B,)P(G2)P(Bs) 3-P(B))P( B,)P(G;) 
={хїх{+14х{х1+1х4х{4 

Жз 18+6+2 26 13 


о 64 = 64 32 

4 Example 12.25. A market research firm is interested in sur- 
veying certain attitudes in a small community. There are 125 house- 
holds broken down according to income, ownership of a telephone or 
ownership of a T.V. 


Household with Household with 
annual income of annual income 
Rs. 8,000 or less above Rs. 8,000 
Telephone No Telephone No 
Subscriber Telephone Subscriber Telephone 
Own T.V. set 27 20 18 10 
No. T.V. set 18 10 12 10 


(i) What is the probability of obtaini btaini 
ISD ы и у of obtaining of obtaining а T.V. 


(ii) If a household has income over Rs. 8,000 and is a telephone 
subscriber, what is the probability that it has a T.V. ? й 


(iii) What is the conditional probability of drawing а house- 
tole Lo ues a T.V., given that the household is " telephone 


(iv) Are the events ‘ownership of a T.V? and ‘telephone subs- 
eriber’ statistically independent ? PIa He and telephone 


(Delhi Uni. M.B.A., 1971) 
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Then from the given data we have : 
27+20+18+10 ibis hy 


Р(4)= 


125 12055 5 
27418418412 75 3 
Po 125 жрк] 
18+124+10+10 ‘50 2 

REO 125 SS SU 


P(ANB)=Pr (that the household owns а TV and 
is a telephone subscriber) 
+127418. 100452 29 


1255357125: чп. 25+ 
P(B N C)=Pr [A household is a telephone subscri- 
ber and has an annual income 
over Rs. 8,000] 
EEE БҮ 1801 0/6; 
ШОБ SETS) 2:095 
P(A N B N C)=Pr [A household owns а TV, is a tele- 
phone subscriber and has an annual 
income over Rs. 8,000] 
18 
23/125 


(i) Required probability P(4)— 5-06 


(ii) Required probability is 


PAN BOC) _ 18/25 3 
P(A/BNC) = рв түсу = 30/025 = 5 


(їй) Required probability is 


=0.6 


^р(лпв)__45125 3 
P(4/B)= Brgy = “75/125 ^ 5—08 
(i) We have 
P(A nB-— 


and P(A)XP(B)= +x 5.9, 


Ѕіпсе Р( AN В)=Р(4).Р(В), 
A and В are statistically independent. 


Example 12.26. The probability that a person stopping at a 
petrol pump will get his tyres checked is 0.12, the probability that he 
will get his oil checked is 0:29, and the probability that he will get 
both checked is 0.07. . 
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(i) What is the probability that a person stopping at this 
pump will have neither his tyres nor oil checked ? 
(ii) Find the probability that a person who has his oil checked 
will also have his tyres checked. 
[Delhi Uni. B.A. (Econ. Hons.)., 1980) 
Solution. Let А denote the event that a person stopping at 


a petrol pump will have his tyres checked, and B denote the event 
that he will get his oil checked. Then we are given : 


P(4)=0 12, Р(В)=0 29, P(A N B)=007 


(i) The probability that a person stopping at this pump will 
have neither his tyres nor oil checked is given by : 


PAN B)=1—P(A U В) 
=1—[P(A)+P(B)—P(A N BJ] 
—1— (0:12 4-0.29— 0.07) 
=1—0.34=0.66 
Gi) Required probability=P(4/8) = (4. B) 
P(B) 
0.07 
0297024 


Example 12.27. There are 12 balls in a bag, 8 red and4 
green. Three balls are drawn successively without replacement. What 
is the probability that they are alternately of the same colour ? 


Solution. The required event can materialise in the following 
mutually exclusive ways : 


(i) The balls are green, red and green in the first, second and 
third draw respectively. 


(ii) The balls are red, green and red in the first, second and 
third draw respectively. 


. Hence by addition theorem of probability, the required proba- 
bility p is given by : 
p=P(i)+P(ii) 969) 


Computation of P (i). Let А, B and C denote the events of 


drawing a red, green and red ball in the Ist, 2nd and 3rd draw res- 
pectively. Since the balls drawn are not replaced before the next 
draw, the constitution of the b 


ag in the three draws is respectively 


me] els] [= 


Ist draw 2nd draw 3rd draw 


Theory of Probability ue 
Pi) =P(ANBNC) 


=P(A).P(B/A).P(C/ANB) ў 
[By compound probability theorem] 


LX I «om ARA 
лз r1 1320 
Computation of P (ii). If the drawn balls are green, red and 


green in the Ist, 2nd and 3rd draw respectively, then the constitu- 
tion of the bag for the three draws respectively is : 


= [е] [8] 15] 


1st draw 2nd draw 3rd draw 


Hence by compound probability theorem, 
Ө, 3595903 
K PG)=77 тү * 10 71320 
‘Example 12.28. A bag contains 5 white and 3 black balls, 
another bag contains 4 white and 5 black balls. From any one of these 
bags a single draw of two balls is made. Find the probability that 
one of them would be white and the other black ball. 
[Guru Nanak Dev. Uni. B. Com., 1975; Meerut Uni. M. Com., 
1976 ; Rajasthan Uni. M. Com., 1976] 
Solution, Let us define the following events : 


A, : First bag is selected. 
A, : Second bag is selected. 


B : Ina draw of 2 balls, one is white and the other is black. 


The required event of drawing one white ball and one black 
ball in a draw of two balls can materialise in the following mutually 
exclusive ways : 


(i) A,MB happens, (ii) 421 B happens. 
Hence by addition theorem of probability, the required proba- 
bility p is given by : 
p=P(i)+P(ii) 
=P(4,N B)+P (A.M B) 
=Р(4;). P(B/A;)+P(Az2). P(B/A2) (Ж) 
Since there are two bags, the selection of each being equally 
likely, we have : З 2 
P(4)—P(4)—1 


P(B/A,)=Probability of drawing one white and one black 
ball in a draw of 2 balls from the first bag. 


714 Business Statistics 


xg $x3x2! 15 
eC ЛО УТТУ $ 


P(B/A,)=Probability of drawing one white and one black 
ball in a draw of 2 balls from the 2nd bag. 
«C,x5C, 4х5х2 5 
esto 


C, 9x8 ^9 
Substituting in (*) we get : 


15 35.15 5 
IU лм узт 
. 135-140 — 275 
504504 


Example 12.29. А lady declares that by taking a cup of tea 
she can discriminate whether the milk or tea infusion was first added 
to the сир. It is proposed to test this assertion by means of an experi- 
ment with 12 cups of tea, 6 made in one way and 6 in the other and 
presenting them to the lady for judgement in a random order. 


(i) Calculate the probability that on the null hypothesis that the 
‚ lady has no discrimination power she would judge correctly all the 12 
cups, it being known to her that 6 are of each kind. 


(ii) Suppose that the 12 cups were presented to the lady іп six 


pairs, each pair to consist cups of each kind ina. random order. How 
would the probability of correctly judging with every cup on the same 


null hypothesis be altered in this case ? 
Which of the two designs would you prefer and why ? 
Solution. (i) The total number of ways in which 12 cups of 


tea, 6 made in one way and 6 in the Other, can be presented to the 
lady at random is 


12! 
76161924 


Of these there is only опе way in which the lady can judge all the 
cups correctly, 


ӨВ probability iz 


(H-2 
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The first method of testing is preferable to the second because 
the probability of correctly judging all the cups is much less in the 
first case as compared with the corresponding probability in the 
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- State and prove the addition Jaw of probability for any two 

А апі В. Rewrite the law when A and В are mutually exclusive. М сво 
(Bombay О. В. Com. April 1983) 


2. (а) State and prove the Multiplication Theorem of Probability, 
[Delhi U. B.A (Econ. Hons. I) 1983] 
(b) State and prove the multiplication theorem of )bability. 
the result modified if the events are not independent ? Ве oe 
[Delhi Uni. B.A. (Econ. Hons. I) 1985] 


3. State the axioms of probability. 
[Delhi Uni. B.A. (Econ, Hons, I) 1984] 


4. Explain wit! iti iplication i 
zh SA bui. h examples the rules of. Addition and Multiplication in 
[C.A. (Intermediate) Nov. 1977 ; Calicut Uni. M. Com. 1975] 
5. (a) What do you understand by conditional probability ? If 
Prob. (4+B)=Prob. A4-Prob. B 
are the two events 4 and B statistically independent ? 
[Delhi Uni. M.A. (Econ, Hons.) 1977] 


b) Explain the meaning of conditional probability of an event. State 
the addition and multiplication rules uf probability. 
[Delhi Uni. В.А. (Econ, Hons.), 1982] 


6. Prove that for two events 4 and B, 
P(AUB)=P(A)+P(B)—P(An 3) 
What happens if 4 and В are mutually exclusive. 


7. А Statistical experiment consists of asking 3 housewives at random 
if they wash their dishes with brand X detergent. List the elements of the 
sample space S using the letter Y for ‘yes’ and N for ‘no’. List the elements of 
the event: “The second woman interviewed uses brand X"'. Find the probability 
of this event if it is assumed that all the elements of S are equally likely to 


Occur. 
[.C.W.A. (Final), June 1983] 
Ans. 1/2 
8 Explain what is meant by sample space ? 
An unbiased coin is tossed three times, Construct the sample space S. 
If E, denotes the event of ‘getting exactly 2 heads’ ; £ the event of ‘getting at 
least two tails’ and Ез the event of ‘getting tail in the first toss’; write down 
the elements of these events and find the probahilities of their occurrence 
assuming that all the elements of S are equally likely to occur. 
Ans. 3/8, 1/2, 1/2 
9. A card is drawn at random from a well shuffled pack of cards. 
t is the bability that it is a heart or a queen ? 
poo cae (С.А. (Intermediate), May 1982] 
Ans. 4/13 
10. The odds аге 9to 5 against a person who is 50 years living till he 
is 70 and 8 to 6 against a personi io р E ҮШ he is 80. Find the probabi- 
lity that at least one of them wi alive after 20 years. 
24 : [C.A. (Intermediate), May 1981} 


Ans 31/49. 
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11. A candidate is selected for interview for three posts. For the first 

post there are 3 candidates, for the isum there are Тош and for the third there 
Р t are his chances of getting at least one post 

CIS uds 5 E [C.A. (Intermediate), May 1980] 


Ала. Read. prob.=1—( 1-+)( 1-4-( 1-5-)- i- 


12. A salesman hasa 60 per cent chance of making a sale to each 
customer. The behaviour of successive customers is independent. If two 
customers 4 and B enter, what is the probability that the salesman will make a 
sale to A or B? 

(Diploma in Management, A.I.M.A., 1977) 

Ans. 0:84 


13. A problem in statistics is given to three students A, B and C whose 
chances of solving it are 1/3, 114 and. 1/2 respectively. What is the probability 
that the problem will be solved ? (Punjab Uni. B. Com., 1971) 


Ans. 3/4. 
14. A person is known to hit the target. in 3 out of 4 shots, whereas 


another person is known to hit the target in 2 out of 3 shots. Find the pro- 
bability of the targets being hit at all when they both try. 


(Punjab Uni. B.Com., 1981) 
Ans. 11/12. 


15. Thereare 3 economists, 4engineers, 2 statisticians and 1 doctor. 
A committee of 4 from among them is to be formed. Find the probability that 
the committee ; 


(i) Consists of one of each kind ; 
(i) Has at least one economist Ч 


(iii) Наз the doctor as a member and three others, 
(Bombay Uni. B.Com., April 1974) 
Ans, (i) 4/35, (ii) 5/6, (iii) 2/5. 


16. Two vacancies exist at the junior executive level of a certain com- 
Pany. Twenty people, fourteen men and six women, are eligible and equally 
Qualified The company has decided to draw two names at random from the list 
of eligibles. What is the probability that : 


(а) both positions will be filled by women ? 
(b) at least one of the Positions will be filled by women ? 
(c) neither of the positions will be filled by women ? 


4, 4, 
Am. OBE DI eO E 


17. An urn contains 5 white, 3 black and 6 red balls. 3 balls are drawn 
at random. Find the Probability that : 


(i) two of the balls drawn are white, 
(i) опе is of each colour, 
(iii) none is black, 
(v) atleast one is white. 
Ans. (i) 90/364, (i1) 90/364, (iii) 165/364, (iv) 280/364. 
18. Ifadice is rolled 3 times, what is the 


obability of 5 coming up 
at least once ? {C.A. ater mediates: November 1983] 
Ans, 91/216. 
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19. Two six-sided dice are tossed at atime. Find the probability of 
getting 1 dot side of the first dice and 5 dot side of the second dice. 
(Rajasthan Uni. М. Com. 1976) 


Ans. 1/36. 


20. The odds against student X solving a business statistics problem 
are 8 : 6 and odds in favour of student Y solving the same problem are 14 : 16 


(i) What is the chance that the problem. will be solved if they both. 
try, independently of each other ? 
ii is thi bability that neither solves the problem ? 
Go yaara E 0 [C.A. (Intermediate), Nov. 1979] 


ER uk) 
Ans. (i) 05 » (ii) d05- 


21. Given P(A4)=1/4, P(BJA)=1/2 and P(4/B)—1/4, find if (i) Aand B 
are mutually exclusive, (ii) A and Bare independent. 


Ans. (i) A and B are not mutually exclusive. 
(ii) A and В аге independent. 


22. If two perfect dice are thrown what is probability of getting : 


(a) a three on the first throw, (6) a four on the second throw, (c) a three 
on the first throw and a four on the second throw, (d) eithera three on the 


first or a four on the second throw ? 
(Guru Nanak Dey. Uni. B.Com., 1979) 
Ans. (a) 1/6, (6) 1/6, (c) 1/36, (d) 11/36 


23. A university has to select one examiner from a list of 50 persons— 
20 of them women and 30 теп, 10 of them knowing Hindi and 40 not, 15 of 
them being teachers and the remaining 35 not. What is the probability of the 
university selecting a Hindi-knowing woman teacher ? 
Ans. Оо 15 2 
озо зо 50е 355 


24. A man wants to marry a girl having qualities : white complexion— 
the probability of getting such a gir] is one in twenty; handsome dowry—the pro- 
bability of getting this is one in fifty ; westernised manners and etiquettes — the 
probability here is one in hundred. Find the probability of his getting married 
to such a girl when the possession of these three attributes is independent, 


Ans. :00001 


25. Anelectronic device is made up of three components A, B, and C. 
The probability of failure of the component A is 0*01, that of B is 0 1 and that 
of C is0:02 іп some fixed period of time. Find tbe probability that the device 
will work satisfactorily during that period of time assuming that the three 


components work independently of one another. 
(Bombay Uni. B. Com., April 1972) 


Ans. 0:99x0:9x 0:98—0:8732 


26. Lloyd, the captain ofthe West Indies cricket team, is reported to 
‘have observed the rule of calling ‘heads’ every time the toss was made during 
the five matches of the last Test series with the Indian team. What is the pro- 
bability of his winning the toss in all the five matches ? 


How will the probability be affected if he had made a rule of tossing a 
coin privately to decide whether to call *heads' or *tails' on each occasion. 


Ans. 1/32; unaffected. 
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27. The probability thata man will be alive in 25 years is 3/5, and the 
probability that his wife will be alive in 25 years is 2/3. Find the probability that 
(a) both will be alive, (Б) опју the man will be alive, (c) only the wife will be 
alive, (d) at least one will be alive, (е) none will be alive, 25 years hence. 

А [Punjab Uni M.A. (Econ.), October 1980] 


Ans. (а) 2/5, (Б) 1/5, (с) 4/15, (d) 13/15, (e) 2/15. 


28. A husband and wife appear in an interview for two vacancies in the 
вате p st. The probability of husband's selection is 1/7 and that of wife's 
Belection is 1/5. What is the probability that : 

(а) Both of them will be selected. 

(6) Only one of them will be selected. 

(c) None of them will be selected ? 

[Punjab Uni. B.Com., 1980] 
1 11045 1:46 2) 6,4 _ 24. 
(ORS IS ут. оу 
29. А bag contains 8 white and 7 black balls. 4 balls are drawn one by 


One without replacement. What is the probability that white and black balls 
appear alternately, 


(Bombay Uni. B.Com., April 1983) 
Ans. 14/195 


30. A bag contains 5 white and 3 black balls, and 4 are successively 
drawn and not replaced ; what is the probability that they are alternately of 
different colours ? [Delhi Uni. В.А. (Econ, Hons. I), 1979] 


Ans, 17 


31. Three groups of workers contain 3 men and 1 woman, 2 теп and 2 

Women, and 1 man and 3 women, respectively. One worker is selected at ran- 

от from each group. What is the probability that. the group selected consists 
of 1 man and 2 women ? 


[Nagpur Uni. M.Com. 1976 ; Meerut Uni. M.Com., 1975 ; 
Delhi Uni. M.Com., 1971] 
Ans. 13/32 


32. An urn contains 10 white and 6 black balis. Fiod the probability 
that a blind folded person in one draw shall obtain a white ball, and in the 
Second draw (without replacing the first one) a black ball. 

(Punjab Uni. B.Com., Sept. 1981) 


33. Thefo lowing table gives the details of the consumer preference for 
а new product to be introduced in the market : 
No. of consumers 
Like Dislike Neutral 
Male 500 250 125 
Female 200 350 75 
Soap te? is ihe Probability that a consumer selected at random from the 
(i) amale who disliked the product ? 
(i) one who liked the Product, given that the Person is a female ? 
(ii) either male or one who disliked the product ? 
(Bombay Uni. B.Com., May 1982) 
Ans. (i) 116, (йу 8/25, (iii) 49/60 


Fe _. 
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- The personnel department of a com; i 
following analysis of its 200 engineers. Di ucro an atoms. 


Age Bachelor's degree only Master's degree Total 
Under 30 90 10 100 
30 to 40 20 30 50 
Over 40 40 10 50 
Total 150 50 200 


If one engineer is selected at random from the company, find : 
(а) The probability he has only a bachelor's degree. 
(b) The probability he has a master's degree, given that he is over 40. 


(c) The probability he is under 30, given that he has only a bachelor's 
degree. [Punjab Uni. B.Com., 1979 ; Delhi Uni. M.B.A., 1977] 


150 1 
Ans, (а) 5090.75, (6) 50-=022, (©) р2--06 


OBJECTIVE TYPE QUESTIONS 
35, Pick out the correct answer with reasoning : 


(i) Two dice are thrown and the sums of the numbers on the faces up 
are obtained. The probability of this sum being 2 is : 


(a) 4 » (5) (с) к ‚ (4) None of these. 
[С.А. (Intermediate), May 1982] 


. Ui) Adie is thrown two times and the sum of numbers on the faces 
up is noted. The probability of this sum being 11 is 


=i 
36" 


"rd T 
6 '36' i" none of these. 
Ans. (i) 1/36, (ii) 1/18 [С.А. (Intermediate), N.S. Nov. 1982] | 


36. Two events А and В аге mutually exclusive : 
P(A)=1/5 and P(B)—1/3. Find the probability that : 

(i) Either А or B will occur. 

(ii) Both A and B will occur. 


(iii) Neither A nor B will occur. 
(Shivaji Uni. B. Com., Nov. 1980) 


Ans. (i) 8/15, (i) 0, (ili) 7/15 
37. Point out the error in the following statement : 


Я The probability that a student will commit exactly one mistake during 
his laboratory experiments is 0°08 and the probability that he will commit at 


least one mistake in 0-05. 
[Delhi Uni. B.A. (Econ. Hons. II), 1983] 
Ans. Wrong ; The latter probability must be greater than the former, 
38. If P(AB)is equal to 0'24 and P(A) is equal to 0°60, then P(B/A) 


е Я 0°84 N f th 
асаа) o TCA. mermediate), Nov. 1983] 
Ans. (d) 


39. The chance of drawing a white ball in the first draw and again a 
white ball in the second draw without replacement of the ball in the first draw 
from a bag containing 6 white and 4 red balls is 
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(а) 2/10 (6) 600 (c) 36/100 ae A. (Intermediate), May 1983] 
Ans. (d). 


40. Given that A, B, C are mutually exclusive events, explain why each 
of the following is not a permissible assignment of probabilities. 


(i) P(4)—0:24, P(B)-04 and  P(4JC)-02 
(ii) P(A)=0-7, Р(В)=01 апі. P(BNC)=03 
(iii) Р(Ау=0-6, Р(АГ\В)= 05. 


(i) Since 4, B, C are mutually exclusive, we must have 


„„. .. (MAU BUC)=P(A)+P(B) 4 P(C)=P(B)+P(AUC)=1. 
which is not so in this case. ) 


(i) РОВС) must be zero (C. CMB)=¢) 


(iii) P(AMB)=P(A)—P(AN B)-0*1, which is not possible since A and 
B are mutually exclusive and hence P(ANB)=0 
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Random Variable, Probability 
Distributions and Mathematical 
Expectation 


13.1. Random Variable. Intuitively, by a random variable 
(r.y.) we mean a real number Х associated with the outcomes ofa 
random experiment. It can take any one of the various possible 
values each with a definite probability. For example, ina throw of 
a die if X denotes the number obtained then X is a random vari- 
able which can take any one of the values 1, 2, 3, 4, 5 or 6, each 
with equal probability 1/6. Similarly, in toss of a coin if X denotes 
the number of heads, then X is a random variable which can take 
any one of the two values : 0 (No head, ie. tail) or 1 (i.e. head), 
each with equal probability 3. 

Let us now consider a random experiment of three tosses of a 
coin (or three coins tossed simultaneously). Then the sample space 
S consists of 22=8 points as given below : 


S={(H, T)x (H, T)x (H, T) 
—((HH, HT. TH, TT)x (Н, T) 
—(HHH, HTH, THH, TTH, HHT, HTT, THT, TTT) 

Let us consider the variable X, which is the number of heads 
obtained. Then X is a random variable which can take any one of 
the values 0, 1 or 2. 

HHH HTH THH TTH HHT HIT THT TIT 


Outcome 
1 2 1 1 0 


Value of X 3 2 2 


If the sample points in the above order be denoted by wi, ws, 
Wa, ee, We then to each outcome w of the random experiment, we 
can assign a real number X — X(w). For example, 


X(w;) =3, X(w,) —2, Хв) =2, -—., X(w)—0. 


Thus, rigorously speaking, random variable may be defined as 
a real valued function on the sample space, taking values on the .real 


722 Business Statistics 


line R(—co, со). In other words, random variable is a ШшреЧоп 
which takes real values which are determined by the outcomes o 
the random experiment. 


Remarks: 1. A random variable is denoted by the capital 
letters X, Y, Z,...etc., of the English alphabet and particular values 
which the random variable takes are denoted by the corresponding 
small letters of the English alphabet. 


2. It should be clearly understood that the actual values which 
the event assumes is not a random variable. For example, in three 
tosses of a coin, the number of heads obtained is a random variable 

+ which can take any one of the three values 0, 1, 2 or 3 as long 
as the coin is not tossed. But after, it is tossed and we get two 
heads, then 2 is not a random variable. 


3. Discrete and Continuous Random Variables. If the random 
variable X assumes only a finite or countably infinite set of values 
it is known as discrete random variable. For example, marks 
obtained by students in a test, the number of'students in a college, 
the number. of defective mangoes ina basket of mangoes, number 


of accidents taking place on a busy road, etc., are all discrete 
random variables. 


On the other hand, if the r.v. X can assume infinite and 
uncountable set of values it is said to bea continuous r.v., e.g., the 
age, height or weight of students in a class are all continuous 
random variables. In case of а continuous random variable we 
usually talk of the value in a particular interval and not at a point. 


Generally discrete I.v.’s, represent counted data while continuous 
T.V.'s represent measured data. 


13.2. Probability Distribution of a Random Variable. Iet us 
consider a discrete r.v. Y which can take the possible values х1, Xs, 


Xas, Xn. With each value of the variable X, we associate a 
number 


pi—P(X—X)) ; i=1, 2. —.n 


Which is known as the probability of Y; апа satisfies the following 
conditions : 


() А Р: Р(Х=Х0)20, (1—1, 2,..., п) — ..(13.) 
i.e., pv'S are all non-negative and 
(її) ®рг=р\-Ер„+...-Ер=1, -- (13.2) 


i.e., the total probability is one. 


"m More specifically, let Y be a discrete random variable and 
efine : 


р(х)=Р(Х= x) 


such that p(x)20 and „Хр(х)=1, summation being taken over 
various values of the variable. 


Probability Distributions 723 


The function ps=P(X=Xi) -or p(x) is called the probability 
function or more precisely probability mass function (p.m.f.) of the 
random variable X and the set of all possible ordered pairs {x, px 
is called the probability distribution of the random variable X. 


Remarks: 1. The concept of probability distribution is 
analogous to that of frequency distribution. Just as frequency dis- 
tribution tells us how the total frequency is distributed among diffe- 
rent values (or classes) of the variable, similarly a probability distri- 
bution tells us how total probability of 1 is distributed among the 
various values which the random variable can take. It is usually 
represented in a tabular form given below : 


х 9 xy Xe Хе Xn 


р(х) : Pi P» Bye Pn 


2. Probability Density Function (Continuous r.v.). In case 
of acontinuous random variable, we do not talk of probability 
at a particular point (which is always zero) but we always talk 
of probability in an interval. If p(x)dx is the probability that 
the random variable X takes the value in a small interval of 
magnitude dx, e.g., (x, x+dz) or ( X-— E cd a), then p(x) is 
called the probability density function (p.d. f.) of the r.v. X. 


13.3. Distribution Function or Cumulative Probability Func- 
tion. If X is a discrete r.v. with probability function p(x) then, the 
distribution function, usually denoted by F(x) is defined as : 

F(x)—P(X&x) (13.3) 
lf X takes integral values, viz., 1, 2, 3,...then 
F(x)=P(X=1) + P(X 22) 4- +. + P(X=x) 
= F(x)=p(1)+p(2)+p (3) +--+. +(x) (34) 
Remarks. 1. In the above case, 
F(x—1)=p(1)+-p(2)+--.+p(x—1) 
=". F(x)— F(z—1)—p(x) 
chro c p(x)=F(@) - F(x—1) (13.5) 
Hence if X is a random variable which can take only positive 
integral values then probability function can be obtained from dis- 
tribution function by using (13.5). 
2. If Xisa continuous r.v. with probability density function 
p(x), then the distribution function is given by the integral 
x 
F=] р(х)ах (13.5 а) 
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134. Moments. If X is a discrete r.v. with probability 
function p(x), then : 
#'r=rth moment about any arbitrary point ‘A’ 
=2(x—A)*p(x) 
pr—rth moment about mean (X) (13.6) 
—X(x—x)' p(x) (13.7) 
In particular, 
Mean (*)=First moment about origin 


—Ex р(х), (13.8) 
taking 4—0 and r=1 in (13.6). 
Variance (x)—4,—X(x—x)* . p(x) (13.9) 


In the expressions from (13.6) to (13.9), the summation is 
taken over the various values of the r.v. X. 


In the case of continuous r.v. with p.d.f. р(х), the above for- 
mulae hold with the only difference that summation is replaced by 
integration over the values of the variable. 


Example 13.1. Adie is tossed twice. Getting ‘an odd number’ 
is termed as a success. Find the probability distribution. of the num- 
ber of successes. 


Solution. Since the cases favourable to getting an odd 
number in a throw of a die are (1, 3, 5), ie., 3 in all, 


Probability of success(S)= 2-4 
Probability of failure (F)=1—} =} 


If X denotes the number of successes in two throws of a die, 
then X is a random variable which takes the values 0, 1, 2. 


P(X—0)—P[F in Ist throw and Fin 2nd throw]=P(FF) 
=P(F)x P(F)=} x $=} 

P(X—1)—P(S and F)+P(F and S) 
=P(S)P(F)+P(F)P(S) 
=$Xt+}x}=4 

P(X—2)—P(S and S)=P(S)P(S)=4x t=} 


Hence the Probability distribution of Y is given by : 


P(x) : 1 i i 


| 


ea Res OF 1А — ЖАШЫ АЛЕ АК: a 
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Example 13.2. Two cards are drawn 


(a) successively with replacement 

(b) simultaneously (successively without replacement), 
from a well shuffled deck of 52 cards. Find the probability distribution 
of the number of aces. 


Solution. Let Y denote the number of aces obtained in a 
draw of two cards. Obviously X is a random variable which can take 
the values 0, 1 or 2. 


P 2 4- eh 
(a) Probability of drawing an асе=-о—=-уу— 


SE н Bpod RUE 
Probability of drawing a non-ace=1—73- B 


- P(X—2) —P(Ace and Ace)=P(Ace) X P(Ace) 
DC Cd ols 
13 13:0169 
P(X=1)=P(Ace and Non-ace)+P(Non-ace and Ace) 
— P(Ace) x P(Non-ace) + P(Non-ace) X P(Ace) 
1512-12: T 09004. 
i3 13713 15 16 
P(X—0)--P(Non-ace and Non-ace) 
=P(Non-ace) x P(Non-ace) 


=12 „12 „144 
1351301169 
Hence we have the probability distribution : 
VE TW C (IR E E a E 
х: 0 1 2 
i ышы н ыа 55 
ма с a 5 
Lau 169 169 169 


(b) If cards are drawn without replacement, then exhaustive 
number of cases of drawing 2 cards out of 52 cards is °С. 


P(X—0)—P(No ace)=P (Both cards are non-aces) 
LUC, 4847188. 
BC, 52х51 221 
Р(Х=1)=Р(опе асе) 
=P (опе ace and опе non-ace) 
хес, 4x48x2_ 32. 
ис, 52х51 221 
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P(X¥=2)=P (both aces) 
mA: A 


BC, 52x51 221 
Hence the probability distribution of X becomes : 


Xt 0 1 2 
12188 32 dl. 
Р): o» 221 221 


————————— 


Fxample 13.3. Obtain the probability distribution of X, the 
number of heads in three tosses of a coin (or a simultaneous toss of 
three coins). 


Solution. Obviously, X is a random variable which can take 
the values 0, 1, 2 or 3. The sample space 5 consists of 22[—8 sample 
points, as given below : 

S={(H, T)x (H, T)x (H, T)} 

={(HH, HT, TH, TT) X (H, T)} 
={HHH, HTH, THH, ТТН, HHT, HTT, ТНТ, TTT} 


The probability distribution is given below: 


Favourable No. of favourable 
events cases 
(TIT) 


(ТТН, HTT, THT) 
(HTH, THH, HHT) 
(HHH) 


No. of heads Probability 
ut) P(x) 


3/8 

1/8 
Example 13.4. Two dice are rolled at random. Obtain the 

probability distribution of the sum of the numbers on them. 


Solution. When two dice are rolled, the sample space S 
consists of 6*—36 sample points as given below : 
(1, 1), (1, 2), ..., (1, 6) 
5=4 (2, 1), Q, 2), .. , (2, 6) 


(6,1), (6, 2), - (6, 6) 


Let Х denote the sum of the numbers on the two dice. Then 
X isa random variable which can take the valués 2, 3, 4, ..., 12 
with the probability distribution given on page 727. 
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Sum of numbers} Fayourable sample No. of favo-| Probability 
(x) points urable саѕеѕ Р(х) 


= 
о ©\о со 34 с\\л шм 
кюч жол OU PRODI к 


Example 13.5. Four bad apples are mixed accidentally with 
20 good apples. Obtain the probability distribution of the number of 
bad apples in a draw of 2 apples at random. 

Solution. Let X denote the number of bad apples drawn. 
Then Y is a random variable which can take the values 0, 1 or 2; 


There are 4+20=24 apples, in all and the exhaustive number 
of cases of drawing two apples is *4C;. 

TI M РРО 99105 
Оно, овуз с 138 


O SO xBC, _ 2x4x20 _ 40 
Р(х] к song. ороз "6136 
Wo BRS as 
P(X—2)--405,— 24x23 — 138 


Hence the probability distribution of X is : 


GIO Uie р cape Rep 
х0 1 2 

(С 95 40 3 

PY) + 138 138 138 


EXERCISE 13.1 


1. Definea random variable and its probability distribution. Explain 
by means of two examples. 

2. State, with reasons, if the following probability distributions are 
admissible or not. 


(i) x: 0 1 2 
р(х): 0:3 02 0:5 
SS Ecce 
(ii) Ж? —1 0 1 


р(х): 0-4 04 03 


в ———— 
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iii xt 0 1 2 3 
n P(x): 02 03 °з 01 
й 1 2 
(iv) ХЕ —2 —1 0 
р(х) : 03 04° -02 0-2 0:3 


Ans. (i) Yes. (ii) No, since Ep(x)1, (iii ) No, since Ep(x)<1, (iv) No, 
since р(о) = —0:2 which is not possible. 


3. Two dice are thrown simultaneously and *getting a number less tnan 
3’ on a die is termed asa success. Obtain the probability distribution of the 
number of successes, 


Ans. me 0 1 2 


P(x): 4/9 4/9 1/9 


4. Obtain the probability distribution of the number of sixes in two 
tosses o fa die. 


Ans. d 0 1 2 
ЕРЕ 10 ГАК 
р): —35 36 


36 


5. Obtain the probability distribution of number of heads in two tosses 
of a coin, 


Ans. x: 0 1 2 
р(х): 1/4 214 1/4 
6. Three cards аге drawn аі гапот successively, with replacement, 


from a well shuffled pack of cards. Getting ‘a card of diamonds' is termed as 
а success. Obtain the probability distribution of the number of successes. 


Ans. EX 0 1 2 3 
} кот ETE 
209: 64 [i 64 64 


7. Two cards are drawn without replacement, from a well shuffled pack 


of cards. Obtain the Probability distribution of the number of face cards (Jack, 
Queen, King and Ace). 


Ans. ek 0 1 2 
УП Ce 18 NOx" 3 M 29 
P: comp ep ECC TEN eet a 


8. Five defective mangoes are accidentally mixed with twenty good ones 
and by looking at them itis not possible to differentiate between them. Four 
Mangoes ate drawn at random from the lot. Find the probability distribntion 
of X, the number of defective mangoes. 


Ans. L 0 1 2 3 4 
3 BC, *Сух®С, С.х, УСС, С, 
И а та вет асе 
969 1140 380 40 


sibs 1 
2530. ~ 2530 2530 72539 = у 


9. Two bad eggs are mixed accidentally with 10 good ones and three are 
drawn at random from thelot. Obtain the Probability distribution of the 
number of bad eggs drawn, 
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Ans. хх 0 1 2 
(i °С, x: T 3C: x °C, E 197 3C, x 9C, ГЕ 
ESET час узсо О АЛОО ОИС, en TR 
10. An urn contains 6 red and 4 white balls. Three balls are drawn at 
шпор; Obtain the probability distribution ofthe number of white balls 
rawn, 


Ans. m 0 1 1 3 
е еее а В 
po: 30 30 30 30 


13.5. Mathematical Expectation. If X is a random variable 
which can assume any one of the values Ху, Xs--., Xn with respective 
probabilities Pı, ps, p" then the mathematical expectation of X 
wean called the expected value of X and denoted by E(X) is defin- 
ed as : 


E(X) =pytitPaXs+ E poxs—Zpx x (13.10) 
where Lpi=pytpot--+tpa=1 ++6(13.11) 


More precisely, if X isa random variable with probability 
distribution {x, p(x)}, then 


E(X)== xx p(z), (13.12) 
summation being taken over different values of X. 
Physical Interpretation of E(X). 


Mathematical expectation of a random variable is nothing 
but its arithmetic mean. 


Remarks: 1. Theterm ‘expected value’ is unfortunate in 
thatitis not in any sense a value which one expects to occur ina 
particular experiment. But if an experiment is conducted repeatedly 
a large number of times under essentially homogeneous conditions, 
then the average of the actual outcomes is the expected value. 
Sometimes, expected value may give results which are impossible or 
absurd. For example, the expected value of the number in a ran- 
dom throw ofa die is 7/2—3.5 [c.f. Example 13.6] ; the expected 
value of the number of heads inthree tosses of a coin is 3/2 [c.f. 
Question 10 ; the expected number of white balls drawn in a draw 
of2 balls trom an urn containing 7 white and 3 red balls is 1-4 
[c.f. Example 13-8] ; the results which are unrealistic and absurd. 


2. Ina game of chance, suppose that a player gains a sum 
‘q’ if he wins and loses a sum ‘b’ if he does not win, i.e., if he fails. 
If p and q are probabilities of his success and failure respectively in 
a single trial, then regarding loss as negative gain, the mathematical 
expectation of his gain is 


ax p4-(—b)X q—ap—bq -- (13.13) 


If the mathematical expectation ofthe gain ofthe player is 
zero, the game is said to be ‘fair’. If the mathematical expectation 
ofthe gain is greater than 1, then the game is said to be biased to 
the player and if the expectation is negative, the game is said to be 


biased against the player. 
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We shall now state some theorems without proof. The proofs 
are beyond the scope of the book. 


Theorems on Expectation n 
Theorem 13.1. E(c)=c, ess 
where c is a constant 


Theorem 13.2. E(cX)—cE(X), «--(13.15) 

where с is а constant. de 
e. (13. 

Theorem 13.3. E(aX+b)=aE(X)+b, ( 


where a and b are constants. 


Theorem 13.4. (Addition Law of Expectation). Jf X and Y are 
zandom variables then 


E(X-- Y) — E(X)d- EY), (317) 
ie., Expected value of the sum of two random variables is equal to 
the sum of their expected values, 


The result can be generalised to n variables. If X,, X, Xn 

Are n random variables, then 
EQG-E Xs one + Xn) = EG) - EX) +--+ E(Xn) --. (13.18) 
or simply E(ZX)—ZE(X) (13.19) 


Corollary. Е(аХ +bY)=aE(X)+bE(Y), ++-(13.20) 
where a and b are constants. 


Theorem 13.5, (Multiplication Law of Expectation), If X and 
Y are independent random variables, then 


E(XY)=E(X) . E(Y) (13.21) 
ie., the expected value of the product o; 


f two independent random vari- 
ables is equal to the product of their expected values. 


In general, if Ж, X, +, Xn are п independent random vari» 
ables, then 


EXX, ... Xn)=E(X,) « Е(Х)...Е(Х) (13.22) 


Remark. It should be borne in mind that the multiplication 
theorem of expectation holds only for independent events while no 
such condition on the variables is Tequired for the addition theorem 
of expectation. 


13-6. Variance of X in terms of Expectation. We have 


& 9 Уат) = Ey күуур (13.23) 
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Also we have a simplified expression for cz? given by 


вг4= Е(Х?)—[Е(Х)? (13.24) 
For a probability distribution {x, p(x)}, we have : 
Mean=£(X)=Exx pz) (13.25) 
апа сг2= Е(Х?)—[Е(Х)]? 
=Ex'p(x)—[Exp(@)F (13.26). 
Theorem 13.6. Var (X-:c)=Var X, 2.13.27) 


where с із a constant. 

This theorem proves that variance is independent ef change of- 
origin. 

Theorem 137. Var (aX)—a*. Var (X) ...(13.28) 
where a is a constant. 


This theorem proves that variance is not independent of change 
of scale. 
Corollary. Combining the results of the above two theorems 
We get : 
Var (aX-Eb) —a* Var (X) ++(13.29) 


Theorem 13.8. Var (c)=0, . (13.30) 
where c is constant. 
4 Example 13.6. A die is thrown at random. What is the expecta- 
tion of the number on it ? 
Solution. Let X denote the number onthe die. Then Xisa 
random variable which can take any one of the values 1, 2, 3 =s% 
each with equal probability 1/6 as given below : 


1/6 1/6 1/6 1/6 1/6 1/6 


s Е(Х)=2=. p(x) 
=1X1/6-+2% 1/64-3X 1/6-+4% 1/6+5% 1/6+6x 1/6 
=}14+2+3+4+5+6) 
well uds 
16:2 
Example 13.7. A random variable X^ is defined as the sum of 
faces when a pair of dice is thrown. Find the expected value of X. 


Punjab Uni. M.A. (Econ.), 1982 ; 
[шл] Calcutta Uni. B.A. (Econ. Hons.), 1979) 
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Solution. Let the random variable X denote the sum of points 
obtained on a pair of dice when thrown, Then the probability dis- 
tribution of X is (c. f. Example 13.4) : 


3/36 


4/36 3/36 2/36 1/36 


7 EX) ==x. pie) 


1 2 3 4 5 
WOME TSK GE 5х SEF OX oe 
БАКУ УЫ 4 ale 
+7Х-є-+8х 36 *?* 36 +10х 36 

2 1 

THX3e 12x46 


=з [246+ 124+20+304+42+40+36+30+22+ 12 ] 


Example 13.8. An urn contains 7 white and 3 red balls. Two 
balls are drawn together, at random, from this urn. Compute the 
probability that neither of them is white. Find also the probability of 
getting one white and one red ball. Hence compute the expected 
number of white balls drawn. 


[Calcutta Uni. B.A. (Econ. Hons.), 1978] 


Solution. From an urn containing 7 white and 3 red balls, 
two balls can be drawn in 10C, ways. Let X denote the number of 


(0) Probability that neither of two balls is white 
— Probability that both balls drawn are red 
UU ho. ү 
роо 3675 
since 2 balls can be drawn out of 3 red balls in *C, ways. 


J(1)— Probability of getting 1 white aud 1 red ball 
"x30,  7X3x2 21 
(MO Monge cases 


since 1 white ball can be drawn out of 7 white balls in "C, ways 
and 1 red ball can be drawn out of 3 red balls in С, ways and all 
these ways can be associated with each other. 
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Similarly, we have : 


p(2)— Probability of getting two white balls 

ТС СЙ Хб CPI 

°С; 2 * T0x9 ү 45 

Hence expected number of white balls drawn is : 
E(X)— Zx.p(x) ] 


Xo nt ш. 
0х5 1520245 
opa. 
EU UNE C 


=14 


Example 13.9. Anil Company estimates the net profit on a new 
product it is launching to be Rs. 30,00,000 during the first year if it is 
"successful ; Rs. 10,00,000 if it is ‘moderately successful’ and a loss of 
Rs. 10,00,000 if it is ‘unsuccessful’. The firm assigns the following 
probabilities to first year prospects for the product. Successful : 0.15, 
moderately successful : 0.25. What are the expected value and stan- 
dard deviation of first year net profit for this product ? 

[C.A. (Final), May 1979] 


Solution. Regarding loss as negative profit, the probability 
distribution of net profit (x) on the new product in the first year is 
given to Бе: 


CALCULATIONS FOR VARIANCE 


Profit (in million Rs.) 
(x) 


Probability p(x) 


2. E(z) Xx. p(x) 20.10 (million Rs.)=Rs. 1,00,000 
Var(x)— Ex? . р(х) —IZx.p(x)T* 

—2.20— (0.19 —2.19 
=> 0:—4/2.19-1.48 (million Rs.) 


Example 13.10. 4 die is tossed twice. Getting ‘a number 
greater than 4' is considered a success. Find the mean and variance 
of the probability distribution of the number of successes. 


Solution. Since the favourable cases for 'getting a number 
greater than 4' in a throw of a die are 5 and 6, i.e., 2 in all, we have; 
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Probability of success (922-1 


1 
Probability of failure (F)= RII + 


Let X denote the number of successes. Y is a random variable 

taking values 0, 1 and 2. Е 
Р(Х=0)=Р(Е and Е)=Р(Е)х Р(Р)=%- х з ЖЕЛЕ 
P(X—1)—P(F and S)+P(S and F) 


=P(F).P(S)+P(S).P(F) 
2 1 1 216274 
Та э tap ee | 


1 1 
P(X=2)=P(S and S)=P(S) Р(5)=-у х 53:593 
COMPUTATION OF VARIANCE 


. Mean (M)—Zx.p(x)— $$ 
Variance (o?) = Zxt*p(x) —[zxp(x)]t— А -( 4 у= + 


Example 13.11. A player tosses two fair coins. He wins Rs. 5 
if 2 heads occur, Rs. 2 if 1 head occurs and Re. 1 if no head occurs. 


(i) Find his expected gain. 


,0 How much should he pay to play the game if it is to be 
fair 
f [Delhi Uni. B.A. (Econ. Hons. I), 1985} 


Solution. In a random toss of two fair coins, the probability. 
distribution of the number of heads (x) is as obtained below * 


Favourable No. of Probability 
events favourable (р) in Rs. 
cases o) 


14=0-25 
24-050 
1/4m0°25 


No. of 
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(i) The expected gain of the player is given by : 
E(y)—Zp.y—-0.25--1-- 1.25=Rs. 2.50 
(ii) The game is said to be fair if the mathematical expecta- 
tion of the gain of the player is zero. Hence the player should pay 
Rs. 2'50 to play the game if it is to be fair. 


EXERCISE 13.2 


1. (a) Define a random variable and its mathematical expectation. 
E (b) What is mathematical expectation ? How is it useful to a business- 
man 
2. А random variable X has the following probability distribution : 
Por -1 0 1 2 
Probability : 1/3 1/6 1/6 1/3 
Compute the expectation of X. 


Ans. 1/2. [Calcutta Uni. B.A. (Econ. Hons.) 1977] 
3, Arandom variable X has the following probability function : 


Values of X, x : —2 —1 0 1 2 3 
р(х): 0-1 k 02 2k 03 k 


Find the value of k, and calculate mean and variance. 
Ans. 0:1 ; 0*8 and 2°16. 


4. A bakery has the following schedule of 
the expected number of cakes demanded per day. 


[Bombay Uni, B. Com., April 1981] 


daily demand for cakes. Find 


No. of cakes dem - 
anded in hundred s 


Probability 


Ans. 508. 


5. In a business venture a man can make a profit of Rs. 2,000 with а 
probability of 0:4 or have a loss of Rs. 1,000 with a probability of 0*6. What 
is his expected profit ? [Bombay Uni. B. Com., April 1978) 

Ans. Rs 200. 

_ 6. If the probability that the value of a certain stock will remain the 
same is 0:46, the probabilities that its value will increase by Re. 0°50 ог Ке. 1:00 
per share are respectively 0°17 and 0-23, and the probability that its value 
will decrease by Re. 0:25 per share is0 14, what is the expected gain per 


share ? 
[Delhi Uni. B.A. (Econ. Hons. I), 1984] 
Ans. Re. 0°28 
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7. A box contains 12 items of which 3 are defective. A sample of 3 
items is selected at random from this box. If X represents the number of defec- 
tive items of 3 selected items, describe the random variable X completely and 


Obtain its expectation. 
К [Punjab Uni. М.А. (Econ.) 1981) 


D NES 0 1 2 5 
"^ Cpu) е na бё 164 
E(X)=0.75. 


8. Obtain the probability distribution of ‘number of sixes’ in two 
tosses of adie. Hence obtain its mean and variance. 
б 0 1 zii Б эз. 
Апз. рр Mean= 3^ Variance 18 
ОЕ ЕЕ 


9. Compute mean and s.d. for the following probability distributions. 


Ж -8 -1 0 4 


Ans. (a) Mean=—0°6, s.d.=1.85 


10. Obtain the Probability distribution of ‘number of heads’ in three 
tosses of acoin. Hence obtain the mean and variance of the distribution, 


5 Dude qe D HERE 
aA PRO: 1 38 3/8 H8! 


Mean=3/2, Variancee-3/4. 


ll. A die is tossed twice, ‘Getting a number less than 3’ is termed as 
success. Obtain the probability distribution and hence the mean and variance 
of the number of successes. 


Ans. Mean=2/3, Variance=4/9 


12. The monthly demand for transistors is known to have the following 
probability distribution : 


Demand (и): 1 2 3 4 5 6 
Probability (р): 0:10 0:15 0:20 025 018 012 
Determine the expected demand for transistors. 
ance. Suppose that the cost (C) of producing *z' 
10,000 4-500 n. Determine the expected cost. 
[Madras Uni. M.A. (Econ.), Dec. 1976] 
Ans. Expected demand for transistors is : En) nx p—3:62 


ЕС) = 110-0004 500n] =10,000+500 £(n)=10,000-+.500 x 3:62 


3. Also obtain the vari- 
transistors is given by the rule, 


14 


Theoretical Distributions 


14.1. Introduction. In Chapter 3 we studied the empirical or 
observed or experimental frequency distributions in which the 
actual data were collected, classified and tabulated in the form of a 
frequency distribution. Such data are usually based on sample 
studies. The statistical measures like the averages, dispersion, 
skewness, kurtosis, correlation, etc., for the sample frequency distri- 
butions not only give us the nature and form of the sample data but 
also help us in formulating certain ideas about the characteristics of 
the population. However, a more scientific way of drawing infere- 
nces about the population characteristics is through the study of 
theoretical distributions which we shall discuss in this chapter. In 
the population, the values of the variable may be distributed accord- 
ing to some definite probability law which can be expressed mathe- 
matically and the corresponding probability distribution is known 
as theoretical probability distribution. Such probability laws may 
be based on ‘a priori’ considerations or ‘a posteriori’ inferences. 
These distributions are based on expectations on the basis of 
previous experience. Theoretical distributions also enable us to 
fit a mathematical model or a function of the form y=p(x) to the 
given data. 


In Chapter 13 we have already defined the random variable, 
mathematical expectation, probability and distribution function, 
moments, mean and variance in terms of probability function. These 
provide us the necessary tools for the study of theoretical distribu- 
tions. In this chapter we shall study the following univariate pro- 
bability distributions : 


(i) Binomial Distribution 
(ii) Poisson Distribution 


(iii) Normal Distribution 


The first two distributions are discrete probability distributions 
and the third is a continuous probability distribution. 
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14.2. Binomial Distribution. Binomial distribution is also 
known as the ‘Bernoulli distribution’ after the Swiss mathematician 
James Bernoulli (1654-1705) who discovered it in 1700 and was first 
published in 1713, eight years after his death. This distribution can 
be used under the following conditions : 


(i) The random experiment is performed repeatedly a finite 
and fixed number of times. In other words z, the number of trials, is 
finite and fixed. 

(ii) The outcome of the random experiment (trial) results in 
the dichotomous classification of events. In other words, the out- 
come of each trial may be classified into two mutually disjoint cate- 
gories, called success (the occurrence of the event) and failure (the 
non-occurrence of the event). 


(iii) All the trials are independent, i.e., the result of any trial, 
is not affected in any way by the preceding ‘trials and doesn’t affect 
the result of succeeding trials. 

(iy) The probability of success (happening of an event) in any 
trial is p and is constant for each trial. g=1—p, is then termed as 
the probability of failure (non-occurrence of the event) and is cons- 
tant for each trial. 

For example, it we toss a fair coin п times (which is fixed and 
finite) then the outcome of any trial is one of the mutually exclusive 
events, viz., head (success) and tail (failure). Further, all the trials 
are independent, since the result of any throw of a coin does not 
affect and is not affected by the result of other throws. Moreover, 
the probability of success (head) in any trial is 2, Which is constant 
for each trial. Hence the coin tossing problems will give rise to 
Binomial distribution. 


Similarly dice throwing problems will also conform to Bino- 
mial distribution. 

Mor precisely, we expect a Binomial distribution .under the 
following conditions ; 

(i) n, the number of trials is finite. 

(ii) Trials are independent. 

(iii) p, the probability of success is constant for h tri 
Then q—1 —7 is the probability of failure in any trial. Ex et 


14.21. Probability Function of Binomial Distribution 


If X denotes the number of successes in n trials satisfying the 
above conditions, then X is a random variable which can take the 
values 0, 1, 2,.--, п ; since in п trials we may get no success (all 
failures), one success, two successes,..., or all the п successes. 


We are interested in finding the corresponding probabilities of 
0, 1, 2,--., п successes. The general expression for the probability 
ofr successes is given Бу: — а 


Sampling Theory and Design of Sample Surveys 801 


Pat u^ 


5. Greater Scope. It appears that there. is possibility of — 
obtaining detailed information only in a complete census where each — 
and every unit in the population is enumerated. But in practice 
because of our limitations in any statistical enquiry in terms of ie 
time, money and man hours and because of the fact that sampling +. 
procedure results in considerable savings in time, money and labour 
it is possible to obtain more detailed and exhaustive information — 
from the limited few units selected in the sample. Obviously, it is on 
relatively easy to collect information on, say, 25 questions from ty 
each of 100 units selected in the sample than to obtain the informa- { 
tion on, say, 10 questions from each of 1,000 units in the population. 1 
Moreover, complete enumeration is impracticable, rather incon- 1 

| 


ceivable if the enquiry requires highly trained personnel and more 
sophisticated equipment for the collection, processing and analysis 
ofthe data. Thesampling procedure is more readily adaptable 
than census for statistical investigations. 


ў 6. Infinite or/and Hypothetical Population. If the population 
is infinite or too large, then sampling procedure is the only way of _ 
estimating the parameters of a population. For instance, the number \ 
of fish in the sea or the number of wild elephants in a dense forest ~ 
can be estimated only by sampling method. 


х Similarly, in case of hypothetical population, as. for example 
in the problem of throwing a die or tossing a coin where the process 
may continue large number of times or indefinitely, the sampling 


procedure is the only scientific technique of estimating the para» ` 
meters of the population. 


d oe Oe 


Je EL 


7. Destructive Testing. If the testing of units is destructive, : 
i.e., if in the course of inspection the units are destroyed or affected 
adversely, then we are left with no other way but to resort to sam- 
pling.. For example : 


(i) to estimate the average life of the bulbs or tubesina _ 
given consignment. р 


(ii) to determine the composition of a chemical salt. 


(iii) to test the breaking Strength of chalks manufactured in А 
a factory or to estimate the tensile strength of the steel rods. 
(ir) to test the quality of explosives, crackers, shells, etc., 


we have to inspect a representative sample since complete census 
will destroy all the items. 


15.7.. Limitations of Sampling. The merits of sample surveys. 
over complete enumeration can be realised only if : : 


Г 


(i) the sample is drawn in a scientific manner, 
(ii) the:appropriate sampling design is used, and 
(ij the sample size is adequate. 
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In spite of the above merits of the sample survey over census, 
the sampling procedure has its limitations and problems which are 
enumerated below : 


(i) Ifa sample survey is not properly planned (or designed) 
and executed carefully, the results obtained will not be reliable and 
quite often might even be misleading. In this context, it may be 

worthwhile to quote the words of Frederick F. Stephen : 


“Samples are like medicines. They can be harmful when they 
are taken carelessly or without knowlege of their effects... Every 


good sample should have a proper label with instructions about its 
use", 


Sampling design must be perfect otherwise it might lead to 
serious complications in the final results. The omission of a few 
units in a complete census may be immaterial but non-response of 
incomplete response from even one or two units in a small sam ple 
might have a significant effect on the final result. 


(i) An efficient sampling scheme requires the services of 
qualified, skilled and experienced personnel, better supervision and 
more sophisticated equipment and statistical techniques for the 
planning and execution of the survey and for the collection, proces- 
sing and analysis of the sample data. In the absence of these the 
results of the survey may not be reliable. 


(iii) Sometimes the sample survey might require more time, 
money and labour than acomplete census. This will be so if the 
sample size is a large proportion of the population size and if com- 
plicated weighted system is used. 


(iv) Sampling procedure cannot be used if we want to obtain 
information about each and every unit of the population. Further, 
if the population is too heterogeneous, it may be impossible to use 
a sampling procedure: 


NO Each sampling procedure, discussed in $ 15.10 to $ 15.15 
has its own limitations. 


15.8. Errors in Statistics. In Statistics, the word ‘error’ is 
used to denote the difference between the true value and the estimat- 
ed or approximated value. In other words ‘error’ refers to the 
difference between the true value of a population parameter and its 
éstimate provided by an appropriate sample statistic computed by 
some statistical device. Thus, in Statistics, the term error is used 
in a different and much restricted sense. It should be distinguished 
from mistakes or inaccuracies which may be committed in the 
course of making observations, counting, calculations, etc. These 

' errors in Statistics arise due to a number of factors such as : 
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Pr) -p(X—r)—"C».pr.q"7* ; r=0, 1, 2,...‚п -- (14.1) 
Proof. The probability of r successes and consequently (n—r) 
failures in a sequence of n-trials in any fixed specified order, say, 
S.F.S.S.F.F...SSF where S occurs r times and F occurs (n—r) 
times is given by : 
PISO FASASOFO FO... ASASOF) 
=P(S). P(F) P(S) P(S) P(F) P(F)-. P(S) P(S) P(F) 
(By compound probability theorem, since the trials 
are independent] 
=р.4.р.р.4.4--р.р.9 . 
=[рхрхрх...г times]x [q¢X qX qX ...(n—r) times] 
mpg (t) 
But їп л trials the total number of possible ways of obtaining 


r successes and (n—r) failures is 


n! а 
r!i(m—r)! | С: 
all of which are mutually disjoint. But the probability for each of 
these "C. mutually exclusive ways is same as given in (*), viz., 
p'q™'. Hence by the addition law of probability, the required pro- 
bability of getting r successes and consequently (n—r) failures in n 
trials, in any order what-so-ever is given by : 
P(X=r) =p" * -Eprq"^ '+...4+p'q”* ("Cr terms) 
="Crptg™ у=; 1, 2, .-, n 
Remarks, 1. Putting r—0, 1, 2,...2 in (14.1) we get the 
probabilities of 0, 1, 2,..., n successes respectively іп л trials. which 
are tabulated below : 


P(r)=P(X=r) 


Since these probabilities are the successive terms in the bino- 
mial expansion (q-- p)", it is called the Binomial distribution. 


2. Total probability is unity, i.e., 1; 


n 
Dr) =O ++.) 
г=0 
= q"- "C10" 1p T^C,q"7?p?-4- 299 +р" 
=@єр)=1 і 


i 


С. р+а=1) — 
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3. The expression for P(X—r) in (14.1) is known as the proba- 
bility (mass) function of the Binomial distribütion with parameters n 
and p. The random variable X following the probability law (14.1) 
is called a Binomial Variate with parameters n and p. 


The Binomial distribution is completely determined, i.e., all 
the probabilities can be obtained, if n and p are known. Obviously, 
qis known when p is given because q—1—p. 

4. Since the random variable X takes only integral values, 
Binomial distribution is a discrete probability distribution. 

5. Forn trials the binomial probability distribution consists 
of (n+1) terms, the successive binomial coefficients being. 

"Cos "Cy" C;, -++, "Cn, "Са 


Since "Со="Съ=1, the first and last coefficient will always be 
1, Further, since 


"Cr="Chry 
the binomial coefficients will be symmetric. Moreover, we have 
> "C, 4-"C, T-"C, t ... "Cn=2” 


i.e., the sum of binomial coefficients is 2". 


The values of the Binomial coefficients for different values of n 
m be obtained conveniently from the Pascal's triangle given 
elow : 


Pascal's Triangle 
[Showing coefficients of terms in (a-+-b)"] 

Value of n Binomial coefficients Sum (2") 
1 $ 1 2 
2 1 2 р 4 
3 1 3 3 1 8 
4 1 4 6 4 1 16 
5 1 5 10 10 5 1 32 
6 1 б 15 20 15 б 1 64 
7 1 7 21 35 35 21 7 1 128 
8 Smee 156. EAO S62 28 а EET 256 
Ө y 79 36 84 126 126 84 36 9 Ш 5/2 


10 1 10 45 120 210 252 210 120 45 10 1 1024 


It can be easily seen that, taking the first and last terms as 1, 
each term in the above table can be obtained by adding the two 
terms on either side of it in the preceding line (i.e., the line above 
it). As pointed out earlier it can be easily verified that the binomial 
Coefficients are symmetric and the sum of the coefficients is 2", 
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14.2.2 Constants of Binomial Distribution 


0 qn 0 0 

1 nCqn3 р "Cg" p 18 nC,qnop 
2 nC,qn-t pi 2'пС,4"7 рї 22 nC,qn-?:p* 
3 nC,gn-s рз 3:nC,qn-^ ра 3? nC,qn-8 рз 


прп п?р" 


Mean=2yp(r) =" 19"1р+2 "C,q"?. p? 4- 3"C4q"-9p4- ... пр" 
—nq"p42 T "2р? 


m se) (n—2) qp ир" 


= | атаар 0—0 (72) а" зра... р" 
Cnp[q* 1- +C, qn tp ртс дара... рт] 


—np(q p)r-i (By Binomial expansion) 
=пр [^ ptq-1] 
Variance — Z?p( r)—[Zrp(r)]? 
—Zr!p(r)— (mean)? set 
Now 
Zr*p(r) 12x "Cig" 3p 4-22 "C,q7p3- E33 "Суд" p+... Epp" 
4n(n—1) 


ng tp e 0) grips 
dO aD очра. я 
=пр [ qn 1g tp +> (n—1)(@—2)q"“p? 
fn] 
a (e-ve ela Dar prt n- np] 
пар)" 1)--(n— plg (n—2)gn72p 4... p] 


—npl(g-- p)?4- (n— 1)p(q4-p*-2)] 
=npl1+(n—1)p] Cpq—) 0-0 


raiser bl c NE 
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Substituting in (*) we get 
| Variance —7npll4-np— p]— (пр)? 
—np[1--np—p—np] 


=npll—pl=npq 
Hence for the binomial distribution, 
Mean=np «+ (14.2) 
2=O= npq ... (14.3) 
Similarly we can obtain the other constants given below : 
pa npg(q—p) (14-4) 
ва=пра1+3ра(п—2)] (14.5) 


Hence the moment coefficient of skewness is : 
p= H n paqo) 


D (npg)? 
SEC dN ...(14.6) 
npq 
= acted ES (14. 
апа М=+ув, = emm (14.6a) 


Ccefficient of kurtosis is given Ьу: 
p, — Ha _ pall +3pq(n—2)) 
eee (pq)? 
__1+3pq(n—2) 
* "pq 
3. a 
3+ PP (14.7) 


= 13; 1-64 
апа Үз=0, Чугу ин +--(14.7a) 


Remarks : 1. Since q is the probability (of failure), we always 
have 0<4=<1. 


“.  Variance=np x q«np (^ 0cq«1) 
= Variance<mean 
Hence for the Binomial distribution variance is less than mean. 
2. As n> co, from equations (14.6) and (14.7) we get 
В.0, 7,0, 8,3 and Y2>0 
“Binomial distribution is symmetrical if p=q=0.5, It is posi- 
tively skewed if p «0.5 and negatively skewed if p>0.5.” 


Obviously from (14.6) and (14-62) we observe that as m 
increases, the skewness of the binomial distribution becomes less 
pronounced, irrespective of the values of p and q. 
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14.2.3. Mode of Binomial Distribution. Mode is the value of 
X which maximises the probability function. Thus if Y=r gives 
mode then we should have 

p(r)»p(r—1) and p(r)>p(r+1) -- (14.8) 

Working Rule to Find Mode of Binomial Distribution. Let Y 
be a Binomial variate with parameters п and p. 

Case (I). When (n4-1)p is an integer. 

Let (n-- 1)p—k (an integer). 

In this case the distribution is bi-modal, the two modal values 
being X=k and X—k—1. 

Thus if n—9 and p=0.4, then(n--1) p—10X0.4—4, which is 
an integer. Hence iu this case the distribution is bi-modal, the two 
modal values being 4 and 4—1=3. 

Case (JI). When (n-- I)p is not an integer. 

Let (n+1)p=k,+f where К, is the integral part and fis the 
fractional part of (n+1)p. In this case the distribution has a unique 
mode at X=k,, the integral part of (n+1)p. 

For example, if n=7 and p=0.6, then (n+1)p=8 х0.6=4.8. 
Hence Mode=4, the integral part of 4.8. 

Remark. If np is a whole number (i.e., integer), then the 


distribution is unimodal and the mean and mode are equal, each 
being np. 


Example 14.1. Ten unbiased coins are tossed simultaneously. 
Find the probability of obtaining, 


(i) Exactly 6 heads (iv) At least one head 
(ü) At least 8 heads (у) Not more than three heads 
(ili) No head (vi) At least 4 heads. 


Solution. If p denotes the probability of a head, then 
P=q=}. Here п=10. If the random variable X denotes the num- 
ber of heads, then by the Binomial probability law, the probability 
of r heads is given by, Я 


P(r)=P(X=r)="Crpt . qh? 
=10C,(})" = (3)07r— 1C, " (4) 


1 * 
=—10 "© EX) 
Р н moss ule * 
(i) Required probability= p(6)— 1624 1c. [From(*)] 
СТОИ 
7 1024 ~ 256 
(i) Required probability -P(X >8)=p(8) +p(9)+p(10) 
= 1 [C+ Ci + 1C» ] 


1024 [From (9]- 
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KUNST 
= om (510 D = 9347 128 


(iii) Required probability Р(Х=0) =p(0) 


1 C * 
= 109 бол цол From! 
(iv) Required probability=P[At least one head] 

=1—P [No head] 
=1~p(0) 
1 1 1023 
091751024 1024 
(9 Required probability —p(X«3)—p(0) 4-p(1)4-p(2) J-p(3) 


1 [rca MC. + 10C, 4- »c, | 


[From Part (iii)] 


1024 

1 
= 04 (14-10 4-45-1- 120) 
M UN 
~ 1024 64 

(vi) Required probability P(X 24) 


=р(4)-Ер(5)+...-Ер(10) 
г з ГСС... 0С, 0]. 


Last part сап be conveniently done as follows : 
Required probability=P(x >4)=1 —P(X«3) 
=1—[p)+p(1)+p(2)+p(3)] 
11 53 


-1— TESTES JPart (у)] 


Example 14.2. Define Binomial Distribution. What is the pro- 
bability of guessing correctly at least six of the ten answers in a 
TRUE-FALSE objective test ? 

LIC. W.A. (Final) December 1979] 

Solution. Definition of Binomial Distribution —See Text. 1n 
a True-False Objective Test, the probability of guessing an answer 
correctly is given by : X 

1 


1 
DET > dep 


By Binomial probability law, the probability of guessing 
correctly z answers in a 10-question test is given by: 
Р(х) = C ;p?q10 16, (yy (g)y-» 
—MC3(3)9: хо. 10! --(*) 
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Hence the required probability P of guessing correctly at 
least 6 of the 10 answers is given by: 


P=p(6)+p(7)+-p(8)+p(9)+p(10) 
= (POPC HCCC, 4100, 3 [From (*)] 


= чо ['*C, 4- 9C, 4-10C, 4- + 1] 


1 10х9х8х7 10х9х8 10х9 

= For 4p Ub top etn +1041] 
1 

= 303; [210+ 120-+44541041] 

__386 _ 193 

=-1024 — 512 


Example 14.3, If the chance that the vessel arrives safely at a port 

is TOR find the chance that out of 5 vessels expected at least 4 
will arrive Safely. [Delhi Uni. B.Com. (Hons.), 1980] 
Solution, p=Probability that a vessel arrives safely at the 


9 
port=—— : 
=> gai pale 1 


0 108 = 
By Binomial probability law, the probability that out of 5 
vessels, x vessels arrive safely at the port is given by: : 


м=р ac, (42) (gh) 
P "V 10 10 


1 . 
= TOR С 0 чы жецу уо 5 


The required probability that at least 4 vessels will arrive 
safely is given by: 
POPS = gs [саноо] 
94 14х94 14х 6561 
= 10 [5*9 ]- 10° = -100000 
=i —0.91854, 


Example 14.4. How many dice must be thrown so that there 
is a better than even chance of obtaining a six ? 
[Delhi Uni. B.A. (Econ. Hons. 1), 1985] 
Solution. Let us suppose that the dice is thrown п times. The 
probability P of obtaining a six at least once in n throws of a dice 
is given by: 
P=Probability of at least one six in n tosses of a dice 
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=1—Probability of ‘no’ six in n tosses of a dice 


--(4) 


We want P to be greater than 1/2. 
SENS 1 1 mW 
ie, i-( 5) > > > i) 


ie., 0.5 (0.83) SAN) 


(0°83)" 083 068899 0:5718 — 04746 0°3939 0°3269... 


By trial, we find thatthe inequality (*) is satisfied when 
n24.Hence the dice must be thrown at least 4 times. Ч 
Aliter. From (*), we get 
0.5 > (0.83)" = log (4) > n log (0.83) 
> —log2 > (1.9191) = —0.3030 > п(—1+0.9191) 
= —0.3030 > —0.0909л = E 


C. Division by negative quantity changes the 
sign of the inequality) 


> п> = > n>3.31 > п>4 


Example 14.5. Assume that half the population is vegetarian 
so that the chance of an individual being a vegetarian is $. Assuming 
that 100 investigators each take sample of 10 individuals to see 
whether they are vegetarians, how many investigators would you ex- 
pect to report that three people or less were vegetarian ? 


[L.C.W.A. (Final), December 1981 ; Banaras Uni. M.Com., 1976] 
Solution. In the usual,notations we have : n= 10, 
p- probability that an individual is a vegetarian =} 
4й=1—р={ 3 


Then by Binomial probability law, the probabilit 
are r vegetarians in a sample of 10 is given NE seh ANS | 


р(т)==1°Сур"д1%—т= ос,(3)' (3)o7* 
=" C, (}ую 


1 
1024 
Thus the probability that ina sample of 10, three or less 
people are vegetarian is : 


зб, О) 4 


— n р 


a 
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P(0)+p(1)-+p(2)+p(3)= Tonal "C 6:410, onc, From (*) 
1 176 11 
== do [1+ 104-45 + 120 |= meds 


Hence, out of 100 investigators, the number of investigators. 
who. will report 3 or less vegetarians in a sample of 10 is: 


1 275 #5 
100x gp = > = 17.2617, 


since the number of investigators cannot be in fraction. 
Example 14.6. Comment on the following : 
For a binomial distribution, mean=7 and уағїапсе= 11. . 
U.C.W.A. (Final), Dec. 1977] 
Solution. For a binomial distribution with parameters п and 
Mean=np=7...(i) ; Variance=npg=11 E 
Dividing (ii) by (i), we get 


which is impossible, Since q being ths probability must lie between 
О апа 1. Hence, the given statement is wrong. 


Example 14.7. The mean and variance of a binomial distribution 


are 3 and 2, respectively. Find the Probability that the variate takes 
values — less than or equal to 2, 


[Delhi Uni. B.A. (Econ. Hons. I), 1983] 


, Solution. If and p are the parameters of the binomial dis- 
tribution, then, we are given : 


Mean=np=3 (n) 
and Variance=npg=2 09) 
Dividing (ii) by (7), we get 
йр 22 —2 == А 
йр тз up ue Jo с есен: 


Substituting in (i), we get 
meets азий зә 
3 1 2 
Let X¥~B(n, p), where n=9 and p=—, I= 


P(X=r)="Crp’ gh? ; r=0, 1, 2,...,п 


=с(+ Yay r-06,1,2,.,9 — (9 
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The probability that the variate takes the value less than 
or equal to 2 is given by : 


P(X«2)— P(X—0)4-P(X—1)3-P(X—2) 
=н) 


IG ees 


2 7 
=f i ) 
=6-44 x 0.0585—0.3767 

Example 14. 8. If the probability of a defective bolt is 1/10, 
find (i) the mean ; (ii) variance ; (iii) moment coefficient of skewness ; 
(iv) kurtosis, for the distribution of defective bolts in a total of 400. 

U.C.W.A (Final), June 1980 ; Dec., 1977] 


Solution. In the usual notations, we have 
1 
n=400, p—Ag 94, q—1—p-0.9 


According to Binomial probability law : 
(i) Mean=np=400 x 0.1—40 
(ii) Variance=npg=400 x 0.1 x0.9=36 
(iii) The moment coefficient of skewness 
=-Ч—РУ _ (08: 0.64 _ 
b= РЕТ Se Se =0.01777~0.018 
Y= + 4/8;—4/0.018— 0.134 
(iv) Coefficient of kurtosis is given Ьу: 
ы l—6pg _ 1—6х0.1х0.9 
8=3+ m cu =3+ тшдес 
=з+ 5 —з+о.о1з=з.о1з 
У„=Ь—3=0.013. 


14.2.4. Fitting of Binomial Distribution 


Suppose a random experiment consists of n trials, satisfying 
the conditions of Binomial distribution and Suppose this experiment 
is repeated A-times. Then the frequency of r successes is given by 
the formula : 

NX p(r)=NX"Crptqr* ; r=0, 1, 2...., n. (14.9) 


Putting г=0, 1, 2,...‚п we get the expected or theoretical 
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frequencies of the Binomial distribution, which are given in the 
following table. 


Expected or Theoretical 
Frequencies 
N. pr) 


No. of 
Successes 


(r) 


Ма" 
М."С\у.д"—їр 
N."C.q"-*p* 


Мр" 


If p, the probability of success which is constant for each trial 
is known, then the expected frequencies can be obtained easily as 
given in the above table. However, if p is not known and if we want 
to graduate or fit a binomial distribution to a given frequency distri- 
bution, we first find the mean of the given frequency distribution by 
the formula x —2 fx/2f and equate it to np which is the mean of the 
binomial probability distribution. Hence, p can be estimated by the 
relation. 


RL 


m= = p= (1410) 


Тһеп q=t—p. With these values ofp and q, the expected or 
theoretical Binomial frequencies can be obtained by using the for- 
mulae given in the above table. 


Example 14. 9. (a) 8 coins are tossed at a time, 256 times, 
Find the expecte — ,equencies of successes (getting a head) aad 
tabulate the results obtained. [C.4. (Intermediate), May 1973] 


(b) Also obtain the values of the mean and standard deviation 
of the theoretical (fitted, distribution. 


Solution. In the usual notations, we are given 
n=8, N=256 
poe of success (head) in a single throw of a coin 
= 
S0 q—l-p-i 
Hence, by the Binomial probability law, the ili 
Successes in a toss of 8 coins is given БУ Probabili ota 


pr) "Corgi 
=8C,.(§) = э 8C, .(*) 
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Hence in 256 throws of 8 coins, the frequency of r successes 
= 1 8(7, — 8 
f(r)—N.p(r)—256 x 256 C. = °С, 


Thus the expected (theoretical) frequencies are as tabulated 
ee EXPECTED BINOMIAL FREQUENCIES 


Expected frequency 


No. of heads 


ONDUAN =O H 


(b) For the theoretical distribution (Binomial distribution), 
Mean=np=8x4=4 


5d.— Упра урф /2—14]42 
Example 1410. Fita binomial distribution to the following 
data : 


x1 0 1 2 3 4 
f: 28 62 46 10 4 
[Delhi Uni. B. Com. (Hons.), 1983] 
Solution. In the usual notations we have : 
n—4; N—Zf—150 
If p is the parameter of the binomial distribution, then 
np=Mean of the distribution x 
Уух 0+62+92+30+4 16 
Now R= DE E EI RT 


0) 


— 200 
~ 150 
Substituting in (*) we get 


moe 
Te 

2A S m2 
oes 7 p=Z and S pm 


The expected binomial probabilities are given by: 
D(x)-*C. pq" 


=o T X z y | (**) 
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У Putting x=0, 1, 2, 3, and 4 in (**), we get the expected bino- 
mial probabilities as given in the following table. 


FITTING OF BINOMIAL DISTRIBUTION 


Frequency 
Дх=М.р(х)=150р(х) 


29°63=230 


4x8 


si 709951 59:26 e059 

2 ON 4 202963 44-44-44 
з 

14-8115 


18522 


Hence the fitted binomial distribution is : 

Xs 0 1 2 3 4 Total 

ЕЕ 30 59 44 15 2 150 
EXERCISE 14.1 


1. What do you understand by theoretical distributions ? Discuss their 
utility in statistics. 


2. What are the conditions under which a Binomial distribution can 
be used 2s an approximation to an observed frequency distribution ? Discuss 


the conditions carefully. 
[Delhi Uni. B.A. (Econ. Fons.) 1978] 


ү 3. (а) What do you understand by ‘binomial’ distribution ? What are 
its main features ? (Delhi Uni. B.Com. (Hons.), 1982 ; 1 
Himachal Pradesh Uni. M.Com., 198? 
(b) State the conditions underlying the binomial distribution. 
s {Delhi U. B.A. (Econ. Hons. I), 1984 
4. (а) Define a binomial variate with parameters n and p and obtain il$ 
probability function. 
(6) Obtain an expression for the mean of the binomial distribution in 
terms cf the number of trials and the probability of success, 
[Delhi Uni. (Econ. Hons. I), 1983} 


5. Obtain the expressions for the mean and vari: f a bi i 
distribution with parameters л and p. Hence show th: ithe Bina m n 
ц a ў а! i 
bution, variance is less than mean. tes rehe nee 


6. 20%.of the bolts produced by a machine are defective. Deduc 
probability distribution of the number of defectives ia a oae oL d 
chosen at random. 

U-C.W. A. (Final) June 1984] 


Ans. p(x)—'C, . (1/5)*(4/5)? ; x=0, 1, 2, 3, 4, 5. 
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7. Four coins are tossed simultaneously. What is the Probability of 
getting (i) 2 heads and 2 tails, (i) at least two heads, and (iii) at least one 
head. (С.А. (intermediate), May 1977} 


La E) 
Ans. (i) i (ii) ie dii) e 


8. What do you uaderstamrd by Binomial distribution ? What are its 
features ? 


Three perfect coins are tossed together. What is the probability of 
getting at least one head ? 
Ans. 7/8 (Delhi Uni. B.Com, (Hons.), 1982] 


9. An oil exploration firm finds that 5% of the test wells it drills 
yields a deposit of natural gas. If it drills 6 wells, find the probability that at 
least one well will yicld gas. (Simplification is not necessary ) ову 

Ans. 1—(0-95)s (Bombay Uni. B.Com., October ) 


10. An accountant is to audit 24 accounts of a firm. Sixteen of these 
are of highly-valued customers. If the accountant selects 4 of the accounts at 


random, what is the Probability that he chooses at least one highly-valued 
account ? 


Ans. 80/81 (Bombay Uni. B.Com., Noy. 1982) 


11. Eight Coins are thrown simultaneously, Show that the probability 
of obtaining at least 6 heads 1s 37/256, 
[L.CW.A. (Final) June, 1974] 


12. Onan average 2% of the population in an area suffers from T.B. 
What is the probability that out of 5 Persons chosen at random from this area 
at least two suffer from T. B. (Simplification is not necessary.) 


Ans. 1—(0:98) x 1-08 (Bombay Uni. B.Com., April 1982) 


4 13, Assuming that it is true that 2 in 10 industria] accidents are due to 
fatigue, find the Probability that— 


(i) exactly 2 of 8 industrial accidents will be due to fatigue. 
(i) atleast 2 of 8 industrial accidents will be due to fatigue. 


[Delhi Uni. B.A. (Econ. Hons.), 1980] 
Ans. (i) *С,(0-2)%(0-8)%, Gi) 1—(0°8)°х2-4 
14. tfonan average 1 ship in every 10 is sunk, find the chance that out 
of 5 ships expected, at least 4 will arrive safely. 
Ans. Y'4x(0:9)!—0:9185 


15. The incidence of occupational disease in an indust is h th 
the workmen have a 20% chance of suffering from it. What W TA probas 
bility that out of six workmen, 4 or more will contract the disease ? 

[4.1.M.4. (Dip. in Management) Jan. 1980 5 
Himachal Pradesh Unit M. Com., Feb. 1983] 


[D on 
Ans. se - 00169 


16. The probability of a bomb hitting a target is 1/5. Tw 


enough to destroy a bridge. If six bombs are aimed at the bri 
Probability that the bridge is destroyed. "ride Bid ehe 


Hint: 526, p=1/5 


1 
7 
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. The bridge is destroyed if at least two of the bombs hit it. Hence the 
required probability that bridge is destroyed is given by. 
2048 


2(2)+р(3)+р(4)+р(5)+р(6)=1—[р(0)+2(1)1=1— 3195 7134 


17, In a multiple choice examination, there are 20 questions. Each 
question has four alternative answers following it aed the student must select 
the one correct answer. Four marks are given for the correct answer and one 
mark in deducted for every wrong answer. A student must secure at least 50% of 
maximum possible marks to pass the examination. Suppose that a student has 
Dot studied at all so that he decides to select the answers to the questions on a 
random basis. What is the probability that he will pass in the examination ? 


(Bombay Uni. B. Com. May 1978) 
Hint. Student will pass if he answers eorrectly at least 12 questiors. 


20 
1 үзу 3 үнс 
Ans. Reqd. probability У "Ca: (3) (+) 


x=12 


3 18. The probability of a man hitting a target is 1/4. (i) If he fires 7 
times, what is the probability P of his hitting the target at least twice ? (ii) How 
many times must he fire so that the probability of his hitting the target at least 
once is greater that 2/3. 


(Delhi Uni. B.A. (Econ. Hons. I), 1984] 
Ans. (i) 0:555, (fi) 4. 


. 19. Suppose that half the population of a town are consumers of rice. 
100 Investigators are appointed to find out its turth. Each investigator interviews 
10 individuals. How many investigators do you expect to report that three or 
less of the people interviewed are consumers of the rice ? 
17 И.С.И.А. (Final), June 1979] 
Ans. B 


20. (a) In 256 sets of twelve tosses of a coin, in how many cases may one 
expect eight heads and four tails ? 
(b) In 100 sets of ten tosses of an unbiased coin, in how many cases 
Should we expect : 
(i) seven heads and three tails, 
(ii) at least seven heads ? 
Ans. (a) 31, (b) (i) 12, (ii) 17 ; 
21. Out of 1,000 families of 3 children each, how many families would 
you expect to have two boys and one girl assuming that boys and girls are 
i ? 
PNE: (Bombay Uni. B.Com., May 1980) 
Ans. 0:375 
22. (a) Bring out the fallacy, if any, in the following statement : 
The mean of a binomial distribution is 5 and its s.d. is 3. 
Ans. It makes g=1°8 which is wrong. 
23. The following statement cannot be m why? “The mean of a 
ї ial distribution i its standard deviation is 3”. 
binomial distribution is 4 and its stan TEC WAS набо Bec: 1919] 
Ans. q—2:25 which is impossible. 
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Eon mean of a binomial distribution is 4 and its standard deviation 
hs v3. What are the values of », p and q with usual notation ? 


[Delhi Uni. B.A. (Econ. Hons.), 1982] 
ANE, m16, р], 4-34, 


equal to 6 and variance 


А 25. A discrete random variable Y has mean 
v, quz. If it is assumed tha: the underlying distribution of Хо binomial, 
. 


t 
з the probability that 5<1<7? 
[Delhi Unt. B.A. (Econ, Hons. 1), 1987] 
Hint. We have ap-6andmpqe2 „ 41/3 > pm 2/3 and n9 
probabilitre PSE X7)» 9(5)+7(6)+-p(7) 46725 
27 26. Compute the mode of a binomial distribution with p= 
Ans. Bimodal ; Modes are 2 and 1. 
п. is the i 
“һ КОО uia ROM, probable numberof times an ace will appear ifa 
[Delhi Uni. B.A, (Econ, Hons,), 1979) 
Ant, (0) 8, (i) Bimodal ; Modes are 8 and 9, 
" 28. ие ity of defecti ' 
dard deviation for he dima y 00 defective bolt is 01 


1/4 and 


find (a) the mean and stan- 


9f defective bolts in a total of $00, and (5) 
the moment coeficients of skewness and kurtosis of the distribution. 
Am. Mean = $0, 4—7, Ti90:119, б, 3-01, Y0 01 
29. Five fair coins were fomed 100 times. From the following outcomes 
Calculate the expected frequencies. 
No. of heads up; 0 1 2 3 4 5 
Observed frequency : 1 10 24 s 18 8 


(Dethi Unt, B.Com, fons.) 11, 1984) 


X. Five coins are 3200 t , find the frequencies of the distri- 
- bution of heads and waite aud DM 


Он the mesa number of euccesses und standard deviation. 
0 (С.А. (Intermediate), Мау 1977) 
Ant. x (No. o beads) : 0 2 3 4 5 
Aa) (frequency) : 100 


1000 1000 500 100 


1 

300 
и. following 

М dn 40S uon re d Who om ам теш of throwing 12 
Sarees 


Ф Preguency Swecess Frequency 
| i : Е 
» " 
H 10 71 
4 4 
: » B 5 
Ы sa 
Роа the and 
of the expected darian rene Ce ed р а 
LL 


1h vr 
1,12, 220, 5, 792, 924, 792, 495, 220, 66, 12, 1, 
Expected mean б, Actual mean= 6139, " 

*4. of fitted distributione V apg 21:732 № 


—> Lae e E 


2 
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143. Poisson Distribution (Asa Limiting Case of Binomial 
stribution). Poisson distribution was derived in 1837 by p Frenob 
thematician Simeon D. Poisson (1781—1840); Poisson distribu- 

on may be obtained asa limiting case of Binomial probability 
distribution under the following conditions : 
(i) n, the number of trials is indefinitely large ie., пет 96. 


(íi) p, the constant probability of success for each trial is ine 
definitely small j.e., p-*0. 


(i) np» m, (say), is finite, 

Under the above three conditions the Binomial p 

ооа, (14'1) tends to the probability function of t 
istribution given below : 


pire РОХ n =, r=0, 1.2, (1440) 


where Y isthe number of successes (occurrences of bep. 
d" тр and е=2'71828 [The base of the system of Natural 


and r t=p(r— 1) 02)... х3х2х 1. 
Derivation of (14.10). We shall obtain the limiting form of 
(141) under the conditions ; 


п =» со and прет ® p and eT 
Probability function of Binomial distribution is 


е ДЕЛУ 


Ec 
pror TE 
x lim ( iy x jim ( i)" 


B xü- 0-9) UO „е Lim (-2y 


To KU - ed s 


xe ( i) m | 
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But we know that : 
lim ( +2)" =e" 


nat 


and ш I Jj =ч, 
n 


pee 


(1411) 


if A is constant independent of n. Substituting these values in (*), 
we get the limiting form of Binomial probability function as 


r Mmr 
2. X1xe"x1]— IL. 
r: r! 
Hence the probability function of the Poisson distribution is 
m 
P(r)=P(X=r)=-£ ; г=0, 1, 2,... 


r! 
as stated in (14.10). 


Remarks. 1. Poisson distribution is a discrete probability 


distribution, since the variable X can take only integral values 0, 1, 
оо, . 


... 2. Putting r—0, 1, 2, 3..., in (14.10), we obtain the probabi- 
Me 50:123. 7s Successes respectively, which are tabulated 


elow : 
No. of Successes Probability 
G) P(r) 


The values of e-™ for some Selected values of 
in Table V in the Appendix at the end of the book. 


3. Total probability is 1. 


ш are given 


oo 
2 
2 p(r)-e "E met + 7. e Pene 
r=0 : 


T mom 
=e [oe ] 


— й 
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=X e" 


2 3 
[ T ebbe Apte - (14.12) 
=e mim 
==] [By law of indices] 


4. If we know m, all the probabilities of the Poisson distri- 
bution can be obtained. т is, therefore, called the parameter of the 
Poisson distribution. 


14.3.4. Utility ог Importance of Poisson | Distribution. 


Poisson distribution can 
be used to explain the behaviour of the discrete random variables 
where the probability of occurrence of the event is very small and 
the total number of possible cases is sufficiently large. As such 
Poisson distribution has found application ina variety of fields 
Such as Queueing Theory (waiting time problems), Insurance, 
Physics, Biology. Business, Economics and Industry. We give below 
Some pratical situations where Poisson distribution can be used : 


(i) Number of telephone calls arriving at a telephone switch 
board in unit time (say, per minute). 


(ii) Number of customers arriving at the super market ; say 
per hour. 
Й (iii) The number of defects per unit of manufactured product 
(This is done for the construction of control chart for c in Industrial 
Quality Control). 


‚ (iv) To count the number of radio-active disintegrations of a 
radio-active element per unit of time (Physics). 


(у) To count the number of bacterias per unit (Biology). 


(vi) The number of defective material say, pins, blades etc. in 
a packing manufactured by a good concern. 


(vii) The number of suicides reported in a particular day or 
the number of cusualties (persons dying) due to a rare disease such 
as heart attack or cancer or snake bite in a year. 


(viii) Number of accidents taking place per day on a busy 


(ix) Number of typographical errors per page in a typed 
material or the number of printing mistakes per page in a book. 


(i), Gi), Gv), (vii) and (viii) are examples of temporal distribu- 
tions and the remaining are examples of spatial distributions. 
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14.3.2. Constants of Poisson Distribution 


Mean = У, ріг) 


r=0 3 l 
TE me" mie" j 
=me 31 +4. Жр +... 

2 3 
ene emere e] 
—me "xen [using (14.12)] \ 
—me "n me ^ 
=m (~ e—1) (14.13) 1 


Variance— Xr?p(r) — [Zrp(r) 
—EZr'p(r)— (mean)? 


—ZEr?p(r)—m? x) 
Now 


zs som 1 
d= = те" р me x. тет +42, S uen | 


=те чү т —— á П m34-... ] 

men [Toon e amu | 
+{т+-2 зт. H 

-—me" [Tiene A ee } 


2 3 
tmf1+m +з Кт H 
=me™[.e"+ me"] [using (14.12)] 


йык 
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=me™.e"(1-+m) 
=m(1+m)e® 
=m(1+m) 
Substituting in (*) we get 
Variance=m(1-+m)—m? 


—m-4-m?—m? 

=m --. (14-14) 
Hence for the Poisson distribution with parameter m, we have 
Mean=Variance=m (14:15) 


i.e., mean and variance are equal, each being equal to the para- 
meter m. 
Ser Other Constants : The moments (about mean) of the Poisson 
distribution are : 
i70 , ра= Variance —m 
My—m , а= m 3m* 


Hence, У $ T 
EE s > ОР LI i 
= ОЛО ТОЙ. Í (14.16) 
1 
os *à— VB TA ++ (14.16а) 
№. m+3m? _ ail 
b= ee =3+ = ++ (14.17) 
1 
= =8—3= — ++(14.17a) 


Remarks. 1. As т->со, Br—>0, Y,—0, 8,3 and Y,—0. 


2. Since u4—m 70, from (14 16a) we observe that 7,70. This 
means that Poisson distribution is a positively skewed distribution. 
As the value of the parameter m increases, У, decreases and thus 
skewness is reduced for increasing values of m. In particular as 
m-»co (large values of т), Y,—0 and consequently the distribution 
tends to be symmetrical for large m. 

_ .14..3. Mode of Poisson Distribution : The Poisson distribu- 
tion has mode at X—r, if p(r) >p(r—1) and p(r) p(r4- 1). 

Case (i) When m is an integer. If m is an integer, equal to k, 
(say), then the Poisson distribution is bi-modal, the two modes 
being at the points Х=К and X—k—1 

Case (ii). When m is not an integer. If m is not an integer then 
the distribution is unimodal, the unique modal value being the 
integral part of m. For example, if т=5 6, then mode is 5, the 
integral part of 5 6. 

Example 14.11. Comment on the following : 


For a Poisson distribution, Mean=8 and Variance 7 


L.C. W.A. (Final), Dec. 1977] _ 
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Solution. The given statement is wrong, since for a Poisson 
distribution mean and variance are equal. 


Example 14.12. Between the hours 2 P.M. and 4 P.M. the 


[.C.W.A. (Final), June 1984] 


Solution. If the random variable Y denotes the number of 
telephone calls per minute, then X will follow Poisson distribution 
with parameter m=2.35 and probability function : 


e r 72:85 f 
P(X¥=r)= єс am тп БЕ SEAMED зу ; r—0, 1, 2... 
ri fL 
Ee] 
The probability that during one particular minute there will 
be at most 2 phone calls is given by: 
P(X<2)= P(X=0)-+ P(X=1)-+ P(Y —2) 

Е (2.35)? * 
et ( 142354 ae) [From (*)] 
= 0.095374 x (1 +2.35+2.76125) 
=0-095374 x 6.11125 
—0.5828543, 


Example 14.13. It is known from past experience that in a 
certain plant there і i 


month. Find the Probability that in a given у 
than 4 accidents, Assume Poisson distribution, (e1—0.0183) 


(Bombay Uni. B.Com., October 1981) 


Solution. In the usual notations we are given m—4, If the 
random variable Y denotes the number of accidents in the plant 
per month, then by Poisson Probability law 


eS er EE ori dt 
fuc А XE 60) 
The required Probability that there will be less than 4 acci- 
dents is given by 
РО 4) P(X=0)+ P(X=1)+ P —2)4- Р(Х=3) 
2 3 
me [ 14444 ir [from (*) 


—e[1-4-44-84-10 67] 

=e 1x 23.67=0.0183 x 23.67=0.4337 

actured by 
1 Probability 

that ina sample of 100 bulbs (i) none is defective, (ii) 5 bulbs will be 


1) December 1979) 


Theoretical Distributions 761 
Solution. Неге we are given : n=100, 
p=Probability of a defective bulb=5%—=0.05 
Since p is small andn is large we may approximate the 

given distribution by Poisson distribution. Hence the parameter 

m of the poisson distribution is : 


т=пр=100х 0.05—5 


Let the random variable X denote the number of defective 
bulbs in a sample of 100. Then (by Poisson law), л 


-m т -5 cr 
кх=г)= OE e ret) 1,2,.. 9) 


(i) The probability that none of the bulbs is defective is given 


by: 
Р(Х=0) =e 5—0 007 [From*] 
(ii) The probability of 5 defective bulbs is given by : 


e 5x55 0.007 x 625 4.375 
P(X¥=5)= 51 nU cy етут О 


Example 14.15. A manufacturer of blades knows that 5% of 
his product is defective. If he sells blades in boxes of 100, and 
guarantees that not more than 10 blades will be defective, what is the 
probability (approximately) that a box will fail to meet the guaranteed 


quality ? [LC.W.A. (Final), June 1978) 
Solution. p=Probability of a defective blade=5%=0.05. 
Since the probability of a defective blade is small, we may use 

Poisson distribution. In the usual notations we are given n=100. 
Hence m=np=100 x 0.05= 5 
It the random variable X denotes the number of defective 

blades in a box of 100, then (by Poisson probability law) 


Au r 75 sr 
rann EEn E 


А box will fail to meet the guaranteed quality if the number 
of defectives in it is more than 10. Hence the required probability 


15: 


10 
P(X>10)=1 -PUX<10)=1— >> pir) 


г=0 
10 


10 s 
= P 
ES >р aie aoe 
r! 


r! 
r=0 r=0 
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i 1 lenses, 

Example 14.16, In a certain factory turning out optica. 
there is a mall chance 1/500 for any one lens to be defective. The 
lenses are supplied, in packets of 10. Use Poisson distribution to 
calculate the approximate number of packets containing no defective, 
one defective, two defective, three defective lenses respective ly ina 
consignment of 20,000 packets. You are given that e 9:1 —( 9802. 

Solution. In the usual notations we are given : 

N=20,000 : n—10 and 1 


p-Probability of a defective optical lensc= -zy 


1 1 
ux cs rom 
Let the random varable X denote the number of defective 


optical lenses in a packet of 10. Then by Poisson probability law, 
the probability of r defective lenses in a packet is given by : 


709 (0.02) 0.9802 (0.02) 
а=) $ oun A (0.02), 


т=пр=10х 


! г! 
Непсе іп а consignment of 20,000 Packets the frequency 
(number) of packets containing r defective lenses is 


N.P(X=r)= 20000 x ux (0.02) - (*) 
Putting r=0, 1, 2, 3 and 4 in (*), we get 
No. of packets containing no defective lens 
= 20000 x 0.9802— 19604 
No. of packets containing 1 defective lens is 


AIIO%0-5802% (0.02) =19604 x 0.02=392.08~392 


No. of packets containing 2 defective lenses is 
20000 x 9.2802 X (0.02)2 fa 392.08 x 0.02 3.920824 
Мо, of packets containing 3 defective lenses is 
20000 x ыс (0029 _ el E 
since the number of packets cannot be in fraction. 
Hence the number of packets coni 


А ‹ n taining 0, 1, 2, and 3 defec- 
tive lenses is respectively 19604, 392, 4, 0. 


14.3.4, Fitting of Poisson Distribution. If we want to fita 
п distribution to a given fre istri 
the mean X of the given distribution and take it e 


obtained, the general formula being 
e"Xxm* 
D(r)=p(X=r)= rT 0, 1; 2, ++ (14.18) 
If М is the total observed frequency, then the expected or 


theoretical frequencies of the Poisson distribution are given by 
Nx plr). 


ee 


АРОН 


A^. ee даай 


a 
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Expected frequencies сап be very conveniently computed as 
explained in the following table : 


Value of 
Variable 
[2] 


Expected or Theoretical 
Frequencies 
Ат)=МР(г) 


Probability p(r) 


p(0)=e™ 


f(0)— Np(0) -Ne-"* 


1 | p(ty=me-m=mp(0) fü) e mNp(0) e mf) 

2 р0)= 96" =3 menie 7 p(1) f2- T - Np(1) => fd) 
з |р) т BE | fa Ne 770) 

4 |o mE om mem LP p) | fe T Np) ^s f() 


Example 14.17. A systematic sample of 100 pages was taken 
from the Concise Oxford Dictionary and the observed frequency distri- 
bution of foreign words per rage was found to be cs follows : 


No. of foreign words per page (X) 


Frequency (f) 


Calculate the expected frequencies using Poisson distribution. 
Also compute the variance of fitted distribution. 


Solution. 
FITTING OF POISSON DISTRIBUTION 
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If the above distribution is approximated by a Poisson distri- 
bution, then the parameter of Poisson distribution is given by 
m=0 99 and by Poisson Probability law, the frequency (number) 
of pages containing r forcign words is given by : 


0-99 Ф 
Np(r)=N.P(X=r)=100X yr 


Putting r=0, 1, 2,...,6, we get the expected frequencies of 
Poisson distribution. 


N.p(0)=100 x ето" 
=100 x Antilog [—0.99 loge] 
= 100 Antilog [—0.99 x logo logs 2.718] 
=100x Antilog [—0.99 x 0.4343] 
=100х Antilog [—0-429957] 
=100х Antilog [-1:570043]—100 x 0.3716 
=37.16 


Np(1)=Np(0) . 1 =37.16х.99=36.7884 
Np(2)=Np(1) . 4 =36.7884x0.495=18.21 


Np(3)=Np(2) . 


І 


18.21х0.33=6.0093 
Np(4)— Np(3) . 2 =6.0093 x 0.2475= 1.4873 
Np(5)=Np(4) . = =1.4873 х 0.198—0.2945 


Np(6)— Np(5) . 70.2945 x 0.165— 0.0486 


ay ula әјә ols ыз 


. Hence the theoretical (expected) frequencies of the Poisson 
distribution are : ы 


Expected frequencies 
(Rounded) 


37:16 3679 1821 601 149 029 0:05 
37 37 18 6 2 0 0 


Since for Poisson distribution, mean and variance are equal, 
i mean and variance of theoretical (fitted) distribution is given 
y: 


Mean=Variance=m=*=0.99 
EXERCISE 14.2 


1. (a) What is Poisson distribution ? Under what conditions is it 
applicable ? 


eed TL, 
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(b) Give afew practical situations where you would expect Poisson 
law to hold. 

> 2. (a) Obtain Poisson distribution as a limiting case of the Binomial 
distribution. 

2 (b) Prove that forthe Poisson distribution, mean and variance are 
eq 
(c) Obtain the mean and variance of binomial distribution with 
{шее п and p. Hence deduce thc mean and variance of Poisson distri- 
ution. 

3. State the distinctive features of the Poisson probability distribution. 
When does this tend to a normal distribution ? 
[Delhi Uni. B.Com. (Hons.), 1983] 


4. Write down the probability function of a Poisson distribution whose 
mean іє 2. What is its variance ? Give 4 examples of Poisson variable. 
(Bombay Uni. B. Com., April 1981) 
Ans. Variance=2, 
5. The standard deviation of a Poisson distribution is 2. Find the pro- 
bability that X223. (Given e-'—0'0183). 
[.C.W.A. (Final), Dec. 1980] 


Ans. 01952 


6. Define a Poisson distribution. 


If X be a Poisson variate with parameter 1, find Pr(3<X<5). [Given 
e '=0'36783]. U.C.W.A. (Final), June 1983] 


Ans, 00153 


7. Between the hours of 2 and 4 P.M., the average number of phonc 
calls per minute coming into the switch-board ofa company is 2:5. Find the 
probability that during one particular minute there will be : 


(i) no phone cali at all, 
Ui) exactly 3 calls, 
(iii) at least 5 calls. 

(Given e71—0:13534 and e~? 5 =0°60650) 
(С.А. (Final), June 1976] 

4 
p 0932: (д) 02133 (ш) i-es X -E7 
Ans. (i) 0°0821 (ii) 02 (i) 1-e oer 
r=0 


8. Write the probability function of Poisson distribution. Give two 
examples of Poisson variate. Accidents occur on a particular stretch of highway 
at an average rate of 3 per week. What is the probability that there will be 
exactly two accidents іп a given week ? (Given e?=20°08), 

(Bombay Uni. В. Com., April 1983) 
9 


Ans, -gr =0'2241 


9; Find the probability that at most 5 defective bolts will be found in 

a box of 200 bolts if it is known that 2 per cent of such bolts are expected to be 
defective. (You may take the distribution to be Poisson). Take e~*=0-0183. 
[I.C.W.A. (Final), June 1980] 


Ans. e( 14448245 m 07845 
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10 If 2 per cent of electric bulbs manufactured by a certain com 
pany Xe bise Ead the probability that in а sample of 200 bulbs, (1) less 
than 2 bulbs, (ii) more than 3 bulbs, are defective. [Given е7%0):0183] 


U.C. A. Untermediate), December 1935) 
Ans, (i) 0.0915, (ii) 0.5669 


ll. Ing certain factory turning out razor blades there isa small XM 
1/500 for any blade to be defective, The blades are supplied in packets of 10. 
se Poisson 


distribution to calculate the approximate number of packets con- 
tainiog no defective, one defective and two defective blades respectively in a 
consignment of 10,000 Packets. 


(Allahabad Uni. M.A. Econ., 1977) 
Ans. 9802, 196, 20 


12. A manufacturer of pins knows that on an average 5% of his product 
is defective. He sells pins in boxes of 100, and guarantees that not more than 
4pins will be defective. What is the Probability that a box will meet the 
Guaranteed quality ? (е—5=="0067). 


(LC.W.A. (Final), Dec. 1982 ; 


Ans. e Боле 80 аот олм 


13. А distributor of bean seeds determines from extensive tests that 5% 

of large batch of seeds will not germinate. He sells the seeds in Packets of 200 

guarantees 90% germination. Determine the probability that a particular 
Packet will violate the guarantee, 


Meerut Uni. M. Com., 1975) 


Ans, 1—e—, 


6 
EZ 4a/x |1107 
x=0 


(ii) at least three misprints ? 
Ans. 1—2/e=0.264 ; 1—2.5/e=0.0802 


16. In a Poisson dist, ibution, the obabilit: for x=0 is 10 per 
cent. Find the mean of the distribution, dcc Daun 
Ans. 2.3026 


17. If X is a Poisson variate and P(X—0) — Pr 1)— Show that k=1/e, 
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De-Moivre proved that under the above two conditions, the 
distribution of standard Binomial variate 


Z= X—E(X) _ X—np 


в М npg ' 


tends to the distribution of standard Normal variate as given in 


(14.21 


Їр andqare nearly equal (i.e. p is nearly 1/2), then the nor- | 
mal approximation is surprisingly good even for small values of n. 


Pared to the value of n required in the case when p and 4 are nearly 
equal. Thus, the normal approximation to the Binomial distribu- 
tion is better for increasing values of n and is exact in the limiting 
Case as п> ос, 


14.3:3. Relation between Poisson nnd Normal Distributions. 
If Yisa random variable following Poisson distribution with para- 
meter m, then 

E(X)—Mean—m 

and — Var (Y¥)=o2=m 

Thus standard Poisson variate becomes 

X—E(X) | Y—m 
Or ^ / m 

It has been proved that this variate tends to be a Standard 

Normal Vatiate if p; co, Thus Normal distribution may also be 


regarded as a limiting case of Poisson distribution as the parameter 
moo, 


‘14.3.4. Properties of Normal Distribution. The normal pro- 
bability curve with mean w and standard deviation с is given by 


Hone е e Pe 


The standard normal probability curve is given by the equa- 


,-—9eeX«x«oo —..(*) 


tion 
— 71 
#@)= = СЕ ть s (r8) 
л 

It has the following properties : 
1. The graph of p(x) is the famous bell shaped curve as 
dr а the diagram. The top of the bell is directly above the 

mean (и). = 
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NORMAL PROBABILITY CURVE 


p(x) 


X=p 


Fig. 14:1 

2. The curve is symmetrical about the line X— и, (Z=0), i.e., 
it has the same shape on either side of the line Х= (or Z=0). 

This is because the equation of the curve $ (z) remains un- 
changed if we change z to —z. 

3. Since the distribution is symmetrical, mean, median and 
mode coincide. Thus, 

Mean=Median=Mode=p 

4. Since Меап=Мейіап= џи, the ordinate at X=p, (Z=0) 
divides the whole area into two equal parts. Further, since total 
area under normal probability curve is 1, the area to the right of 
the ordinate as well as to the left of the ordinate at Х= (or Z=0) 
is 0 5. 


5. Also, by virtue of symmetry the quartiles are equidistant 
from median (д), i.e., 


Q,—Md=Md-Q, => Q,+0,=2Md=2p ...(14.22) 


6. Since the distribution is symmetrical, the moment coeffici- 
ent of skewness is given by : 


&-0 = y,-0 (14.23) 
7. The coefficient of kurtosis is given by : 
86,—3 > 7,=0 (14.24) 


. 8. No portion of the curve lies below the a-axis, since p(x) 
being the probability can never be negative. 


9. Theoretically, the range of the distribution is from—oo to 
co, But practically, Капре= бс. 
= » 


10. As x increases numerically [i.e. on either side of X¥=,], 
the value of p(x) decreases rapidly, the maximum probability 
occurring at z—p and is given by [Put x= in (*)] 


[pee --. (14.25) 


Thus maximum value of p(z) is inversely proportional to the 
standard deviation. For large values of c, p(x) decreases, i.e., the 
curve tends to flatten out and for small values of c, p(x) increases, 
i.e., the curve has a sharp peak. 


11. Distribution is unimodal, the only mode occurring at 
ед; 
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12. Since the distribution is symmetrical, all moments of odd 

order about the mean are zero. Thus 
Hanaí770 ; (n=0, 1, 2, ...) (14.26) 
Le, == ua =H; =.= ++-(14.26a) 
13. The moments (about mean) of even order are given Бу: 
Hon=1. 3. 5....(2n— 1)o?^ , (n—1, 2, 3...) (14.27) 

Putting л=1 and 2 we get 

ia 70? and p4=1.304=3e4 ++e( 14.28) 


А Bs? А 304 
2 =з =0 and б„= гэ а сг =3 (14280) 


14. X-axis is an asymptote to the curve, i.e., for numerically 
large value of X (on either side of the line X—4 , the curve becomes 
parallel to the X-axis and is supposed to meet it at infinity. 


15. Alinear combination of independent normal variates is 
also a normal variate. If Xj, Xa ..., Ya are independent normal 
variates with means /, us, «++, иһ and standard deviations бу, буо») 
о» respectively then their linear combination 


4, X, 4: as +... tH an Xn (14-29) 
Where ау, а, --., dn ате constants, is also a normal variate with 
Mean T, T-4spa.-. T aan ] ..(14.29а) 
Variance —a*,0,* J- a,?o,?-- ... ас 


In particular, if we take a,=a,=...=an=1 in (14.29) then we 
Betsy y, bid Xa isa normal variate with mean + u- ... Бип 
and variance o+ 0+... 0,2." 

Thus, the sum of independent normal variates is also a normal 
variate. This is known as the ‘Re-productive or Additive Property’ 
of the Normal distribution. 


If we take a,=a,=1, then we have from (14.29) and (14.292) : 


Хү+Х» isa normal variate with mean p+. and variance 
opto. 

Further if we take a4—1 and a,— —1 in (14.29) and (14.29a) 
we get : 

X,—X, isa normal variate with mean шу — ы; and variance 
oP og". 

Hence the sum as well as the difference of independent normal 
variates is a normal variate. 


16. Mean Deviation (M.D.) about mean or median or mode, 
Г M— Ма= Mo] is given Ьу: 


MD.=,| 2. а=0Л979ое $o 014.30) 
т 
17. Quartiles аге given (in terms of и and с) by : ; 
Qı=p—0 67450 and Q,=p+0.67450 (01431) 
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18. Quartile deviation (Q.D.) is given by 


Q.D.— 2: Or 0 61450026 [From (14.31)]...(14.32) 
Also 
2 4 5 4 5 
Q.D.——5 == =x om М.Ю. [Front (14-30)] 
`~ Q.D. 3 MB: (14.33) 


6 
19. We have (approximately) : 


na 4 ura. d, 
Q.D.: M.D. : S.D.:: uj 0 yi m 
= Q.D.: M.D.:8.D. ::10: 12:15 (14-34) 
20. From (14.30) and (14.33) we also have 
45.р.=5М.р.=60.р. ...(14.35) 


21. Points of inflexion of the normal curve are at Х= ис 
ie. they are equidistant from mean at a distance of c and are 
given by : 


хво, род=-—==— = gin 


22. Area Property. One of the most fundamental proper- 
ties of the normal probability curve is the area property. The area 
under the normal probability curve between the ordinates at 
X=p—o and X=p+o is 0.6826. In other words, the range pte 
covers 68-26% of the observations. 


The area under the normal probability curve between the 
ordinates at Х=н—2в and X=p-+2c is 0 9544 i.e., the range p20 
covers more than 95% of the observations. 


The area under the normal probability curve between the 
ordinates at X—4—3e and X-g--3e is 0.9973 i.e, the range 
p30 covers 99.73% of the observations. Hence, for practical pur- 
poses, the range u.4-3c covers the entire area, which is 1 [or all the 
observations]. 


The standard normal variate corresponding to X is Z= 


(ia Ces p m 
c 


When X2utco, Z= 


When ¥=p-c, Z= Mss 


с 


pt2o—u 


с 


When X=p+2¢, Z= =2 


Х—ь 
6 


— 
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When X¥=p—2c , Zu Ё—20—8 =—2 


“When Хер Зо, Za +36; 


When kek -Sar о ала паа ME 


с 


Hence the area under the standard normal probability curve 
(i) Between the ordinates at Z=+1 is 0.6826. 
(ii) Between the ordinates at Z= +2 is 0.9544. 
(iii) Between the ordinates at Z= 4-3 is 0.9973. 
"These areas are exhibited in the Fig. 14.2 
AREAS UNDER NORMAL PROBABILITY CURVE 


BES 
m р-26 poo X= pc Ht H+30 
23 522 2475. A 1 2 3 


e 99.73%, ———— € 
Fig. 14:2 


The following table gives the areas under the normal probabi- 
lity curve for some important values of Z : ; 


Area under the curve 


5095—0:50 
68.26% —0:6826 
95% —0:95 


Distance from the mean 
ordinates in terms of +0 


Z=+0.6745 


m 


999% =0:99 
99.73% =0°9973 


Remark. These values of Z and the corresponding areas 
under the normal probability curve are of great practical utility in 
Statistics and should be committed to memory. 


М 14.3.5. How to Compute Areas Under N - 
lity goe Mathematically, the area bounded bye КО 
р ze the ordinates at Х=а and Y=b is given by the defini 
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b 
| 249 д» 
a 
But since p(x) is probability density function, it is represented 
by 
b 
Ра<х<®= | p) dx, (14.36) 


and is shown in the Fig. 14.3 


Fig. 14:3 


г Let us now try to compute the areas under the normal proba- 
bility curve. 


а | p(z)dx 
B 


is the area under the normal curve (14.19) enclosed by x-axis 
and the ordinates at Y=, and X—a as shown below : 


P(m<x<a) 


ReMi Ха 
2-0 73% 


Fig. 14:4 


When Хе Ze TB. ETE no 


When Х=а, Z= -E =z, (say), 
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2% 
E 
-. Pu<¥<a)=P0<Z<z,)= | $00) dz 
. 0 
1 n 77 
zs 
a d ..(1437 
zx] "ne cu 
This definite Inte 


gral which gives the area under the standard 
normal probability curve bounded by z-axis and the ordinates at 
Z-0 and Z=z, has been evaluated and tabulated for different 


values of z, at the intervals of 0.01 and are given in Table VI 
in the Appendix at the end of the book, 


Xpo хәр Xcue€ec 
ze—l =) gel 
Fig. 14:5 
In particular we have : 
P(p—o<X¥<pt+o)=P(—ic ze 1) 
=2P(0<Z<1) [Ву symmetry] 


=2х 0.3413 [From normal tables] 
Similarly, =0.6826 


P(n—1.96c c <и 1.960)=P(—1 96<Z<1 .96) 
=2P(0<Z<1.96) 
=2x0.4750=0.95 

P(u—20<X¥<p+20)=P(~2<7<2) 
=2P(0<Z<2) 
—2X0.477220.9544 

P(u—2.58c <Х<н+2.58о)=Р(—2.58<27< 2-58) 
=2P(0<Z<2.58) 
=2X0.495=0.99 

P(u— 3ec X « a 3o) - P( 3243) 
=2P(0<Z<3) 
=2 Xx 0.49865» 0.9973 
Remarks 1. Since total probability is always 1, we have 


[ош = [коша 
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ie., the total area under the normal probability curve is 1. 


2. Since the areas under the normal probability curve have 
been tabulated in terms of the standard normal variate Z in the 


form of definite integral 
E 
[#@©=Р0<2<), 
0 


for practical problems we don't deal with the variable X but first 
convert it to S.N.V.Z. Next, we try to convert the required area 
in the form P(0<Z<z,) by using the following results : 
P(X>p)=P(Z>0)=0-5 
P(X<p)=P(Z<0)=0.5 


and making use of the symmetry property of the distribution, 


Fig. 14:6 


Computation of Area to the Right of the Ordinate at X=a, i.e., 
to find P(X>a). 
Case (i) . ap; i.e., a is to the right of the mean ordinate, 


РХ >а) 
и =Р(2 >14) 
х= р х =а 
2:0 1:1, 
Fig. 14:7 
; a= 
When X=a, Z=——* =2,, (say). 


с 
Р(Х>а)=Р(2> 21) 
=0.5—P(0<Z<z,) 
and the probability P(Q<Z<z,) can be read from the Table VI in 
the Appendix. : 


Mr t mrt eer agam rr E a a i RE e E a 2 
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Case (ii) . ap, i.e., а isto the left of the mean ordinate, 


а xs Xsà Хад. 
Del xs 222) 70 Zaz 
Fig. 148 
Since a, the value of Z corresponding to Х=а will be 
negative. 
When Х=а, Z= 2—8 = —21, (say) 

i Р(Х>а)=0'5+Р(—2<2<0) [From the diagram] 
=0'5+P(0<Z<z) [Ву symmetry] 


and P(0<Z<z,) can be read from the Normal Tables. 


Computation of the Area to the Left of the Ordinate at X-sb 
i.e., to find P(X<b). 


Case (i). b>p ie., b is to the right of the ordinate at X=p, 


when X=b, Z= = =21, (say) 


R P(X<b)=0°5+P(0<Z<z,) [Obvious from diagram] 
Case (ii) . b< p, i.e., b is to the left of the ordinate at X= p. 


Fig. 14:10 
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ae P(X¥<b)=P(Z<—z,) 
=Р(2> 21) [Ву symmetry] 
=0.5—P(0<Z<z,) 

14.3.6. Importance of Normal Distribution. Normal distri- 
bution has occupied a yery important role in Statistics. We enume- 
rate below some of its important applications. 

1. If X is a normal variate with mean и and variance o°, then 
we have proved that 

P(n—3e € X <p 3e)= P(—3<Z<3)=0.9973 
> PL | Z| >3]=1—0.9973=0.0027 


Thus, the probability of standard normal variate going outside 
the limits--3 is practically zero. In other words, in all probability 
we shoud expect a standard normal variate to lie between the limits 
+3. This property of the normal distribution forms the basis of 
entire Jarge sample theory. [For details see chapter 17, tests for 
proportions and variables.] 

2. Most of tne discrete probability distributions (e.g. 
Binomial distribution, Poisson distribution) tend to normal distri- 
bution as n, the number of trials increases. For large values of n, 
computation of probability for discrete distributions becomes quite 
tedious and time consuming. In such cases, normal approximation 
can be used with great ease and convenience. 

3. Almost all the exact sampling distributions, e.g., Student's 
t-distribution, Snedecor's F-distribution, Fisher's Z-distribution and 
the Chi-square distribution conform to normal distribution for large 
degrees of freedom (i.e., as п->ос). 


4. The whole theory of exact sample (small sample) tests, 
viz., t, F. 4? tests, etc., is based on the fundamental assumption that 
the parent population from which the samples have been drawn 
follows Normal distribution. 


5. Perhaps, one of the most important applications of the 
Normal distribution is inherent in one of the most fundamental 
theorems in the theory of Statistics, viz., the Central Limit Theorem 
which may be stated as follows : 


“If Xy, Xo- , Xn are n independent rondom variables following 
any distribution, then under certain very general conditions, their sum 
EX-—X X-eXs is asymptotically normally distributed, i.e., 
ХХ follows normal distribution as по”. 


An immediate consequence of this theorem is the following 
result. 
“Tf X; Xs. Xn is a random sample of size n from any popu- 
lation with mean u and variance o*, then the sample mean š 


Pak anexos) T xx, 
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is asymptotically normal (as п->ео) with mean n and variance с?п”, 


6. Normal distribution is used in Statistical Quality Control 
in Industry for the setting of control limits. 


7. W.J. Youden of the National Bureau of Standards des- 
cribes the importance of the Normal distribution artistically in the 
following words : 
THE NORMAL 
LAW OF ERROR 
STANDS OUT IN THE 
EXPBRIENCE OFMANKIND 
AS ONE OF THE BROADEST 
GENERALISATIONS OF NATURAL 
PHILOSOPHY. IT SERVES AS THE 
GUIDING INSTRUMENT IN RESEARCHES, 
IN THE PHYSICAL AND SOCIAL SCIENCES 
AND IN MEDICINE, AGRICULTURE AND 
ENGINEERING. ITIS AN INDISPENSABLE TOOL FOR 
THE ANALYSIS AND THE INTERPRETATION OF THE 
BASIC DATA OBTAINED BY OBSERVATION AND EXPERI- 


MENT. 


The above presentation, strikingly enough, gives the shape of 
the normal probability curve. 


,9. Lipman reveals the popularity and importance of normal 
disribution in the following quotation ; 


“Every body believes in the law of errors (the normal curve), the 
experimenters because they think it is a mathematical theorem, the 
mathematicians because they think it is an experimental fact.” 


Example 14.18. Suppose the waist measurements W of 800 
girls are normally distributed with mean 66 cms, and standard de. 
viation 5 ems. Find the number N of girls with waists— 


(i) between 65 and 70 ems, 
(ii) greater than or equal to 72 cms. 
{Delhi Uni. B.A. (Econ. Hons. I), 1985] 


: Solution. W : Waist measurements (in cms.) of girls. We 
are given W~N(u, с?), where u—66 cms and, o=5 cms, 


Za Va 07—66 
Б > g 
(Standard Norma] Variate) 


(i) The probability thata girl has waist between 65 oms and 
70 ems is given by : 
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P(65«W«70)— P(—02«Z«0.8) 
=P(—02<€Z<0)+P0<Z<0.8) 
=Р(0<20.2)+Р(0<2<0.8) 


(Ву symmetry) 
=0.0793-+-0.2881=0.3674 


Hence in a group of 809 girls, the expected number of girls 
with waists between 65 cms aed 70 cms is 


800 x 0-3674=293.92—= 294 
(ii) The probability that a girl has waist greater than or equal 
to 72 cms is given by 
P(W>12)=P(Z> 1.2) =0.5—P(0<Z<1.2) 
=0-5—0.3849=0.1151 


Hence in a group of 800 girls, the expected number of girls 
with waist greater than or equal to 72 cms is : 


800 x 0.1151=92.08~92 


Example 14.19, Assume the mean height of soldiers to be 68.22 
inches with a variance of 10.8 inches. How many soldiers in a regiment 
of 1,000 would you expect to be (i) over six feet tall, and (ii) below 
3.5 feet? Assume heights to be normally distributed. 


[Punjab Uni. M.A. (Econ.) 1980] 
Solution. Let the variable Y denote the height (in inches) 
of the soldiers. Then we are given : 
Mean=p=68.22 and Variance—6*— 10.8 
A soldier will be over 6 feet tall if X is greater than 72 
(because X is height in inches). 
=72, г Хв. 72-6822 378 — 
When X¥=72, Z= ОШ DUE E 3556 1.15 
The probability that a soldier is over 6 feet tall is given by : 
P(X>72;=P(Z>1. 15)—0.5— PO<€Z<1-15) 
=0 5—0.3749=0.1251 (From Tables) 
Hence in a regiment of 1,000 soldiers, the number of soldiers 
over 6 feet tall is : 
1000 x 0.1251 =125.12125 
(ii) The probability that à soldier is below 5 5’=66’" is given 
by: 
S toii) 2:02 
P(X<66)=P( Z< CUm =z( Z< -5.286 ) 
=P(Z< —0.6756)=P(Z>0.6756) (By symmetry) 
=0.5—P(0<Z<0.6756) . 
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=0.5—0.2501=0.2499 (approx.) 
Hence the number of soldiers over 5-5 feet in a regiment 
of 1,000 soldiers is 1000 x 0.2499=249-9250 
Example 14.20. The weekly wages of 1,000 workmen are 
normally distributed around a mean оў Rs. 70 and with a standard 
deviation of Rs. 5. Estimate the number of workers whose weekly 
wages will be : 
(i) between Rs. 70 and 72. 
(ii) between Rs. 69 and 72. 
(iii) more than Rs. 75. 
(iv) less than Rs. 63. 
(v) more than Rs. 80. 


Also estimate the lowest weekly wages of the 100 highest paid 
workers. [Delhi Uni. B.A. (Econ. Hons.) 1977] 

Solution. Let the random variable X denote the weekly wages 
in Rupees. Then X is a normal variable with mean 470 and 
t—5. The standard normal variable corresponding to X is - 


z= х-ы x X—70 
с 5 
(i) We want P(70 X« 72) 
When X—70, zm =0 
When X—72, Z= PT 04 


P(70<X<72) =P(0<Z<0.4)=0.1554 
Therefore, the number of workers with weekly wages between 
Rs. 70 and Rs. 72 is : 


1000 x 0.1554—155.422155 
(ii) P(69<X<12)=P(—0.2<Z<0.4) 
=P(—0.2<Z<0)+P(0<Z<0.4) 
=P0<Z<0.2)+P(0< Z<0.4) 


(By symmetr: 
=0.0793+0.1554=0.2347 Y У) 


Hence, the required number of workers is : 
1000 x 0.2347=234.7~235 

(tii) We want P(X>75) 

When х=75, 26 235—510... 


5 1 


1-0 2:1 
Fig. 14:11 
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* P(0<Z<z,)=0.5—0.1=0.40 


From the normal probability tables and (ж) we get 
z= Am 1.28 (approx.) 


> xı=70+5X 1.28=70+6.40=76.40 


Hence the lowest weekly wages of the 100 highest paid 
workers are Rs. 76-40. 


Example 14.31. A set of examination marks is approximately 
normally distributed with a mean of 75 and standard deviation of 5. 
If the top 5% of students get grade A and the bottom 25% get grade 
F, what mark is the lowest A and what mark is the highest Ё? 


[Bombay Uni. B. Com., Oct. 1976] 


Solution, Let X denote the marks in the examination. Then 
X is normally distributed with mean 4—75 and s,d. o—5. Let a, 
be the lowest marks for grade А and х, be the highest marks for 
grade F. Then we are given : 


Hence the number of workers with weekly wages less than 
Rs. 63 is : 


1000 x 0.0808=80 8c«81 
(у) P(X>80)=P(Z>2) 


(When X= 80), ze 


=0:5—P(0<Z<2) 
=0.5—0 4772=0.0228 
Hence the number of workers with weekly wages over Rs. 80 
1000 x 0.0228 =22.8=23 


Proportion of the 100 highest paid workers is 


* 100 1 
1000 = 19 -910 


We want to determine X— х;, say, such that 
P(X7x,)—0.10 


When X—e, Z= 5-10 =z; (say). NO) 


Then P(Z>z,)=0.10 
> P(0<Z<z,)=0.5—0.1=0.40 
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From the normal probability tables and (*) we get 


z= т NES (approx.) 
x,;=70+5X 1.28—704-6.40— 76.40 
Hence the lowest weekly wages of the 100 highest paid 
workers are Rs. 76.40. 

Example 14.21. A set of examination marks is approximately 
normally distributed with a mean of 75 and standard deviation of 5. 
If the top 5% of students get grade A ond the bottom 25% get grade 
F, what mark is the lowest A and what mark is the highest F? 

[Bombay Uni. B. Com., Oct. 1976] 

Solution. Let X denote the marks in the examination. Then 
X is normally distributed with mean #=75 and s.d.o=5. Let 
be the lowest marks for grade A and x, be the highest marks for 
grade F. Then we.are given : 


+ р(Х>х1)=0:05 and P(X<1x,)=0.25 


Fig. 14:13 
Then the standard normal variables corresponding to x, and 
æ are given by [See diagram above] : 


ALB ASP ыл, (ay) 
25 — + 
Z= A NES 75 =, (Gay) see y 


[Note the negative sign for z,]. From the figure we obvious- 
ly get : 
Р(0<2< 21):=0.45 = 2, —1.645 (approx) [From tables ] 
P(—2z,.<Z<0)=0.25 => PO<Z<z,)=0.25 (By symmetry) 
= _ 230.675 (approx.) [From tables] 
Substituting for z, and zg in (*) we get : 
x,—7154-52,— 15-5 х 1-645=83.225=83 
X= 75—5z,=75—5 x 0-675=71 62572 
Hence the lowest mark for grade A is 83 and the highest 
mark for grade F is 72. 
Example 14.22. Fora normal distribution with mean p and 
standard deviation с, obtain the first and third quartiles and also the 
quartile deviation. 
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Solution. By definition of О, and Оз we have : 
P(X<Q,)=0 25 
and P(X>Q,)=0 25 


When X—40,, Za e. =—2,, (say). 7} 


i avi ( 
When X- Oa Z= 2: Hrs (*) 


(Obvious from the diagram because of symmetry). 
Thus we get (obvious from the figure) : 
Р(0<2<21)=0.25 


= 210.6745 (approx.) 
[From Normal Tables] 


Substituting in (*) we get : 
Q,= p—0z,;=p— 0.6745 « s (**) 
and Q,=p+07,;=+ 0.6745 в a«.(***) 


Subtracting (**) from (***) we have : 
Q4—Q1—2X0.6745 c 


^. Quartile Deviation A7 Q3 0.6745 o 


Example 14.23. For a normal distribution with mean 50 and 
s.d. 15, find Q, and Qs. 
(Bombay. Uni. B. Com., May 1978) 
Solution. We have : p=50, с=15 
Q,=Mean—0.67450=50—0.6745 x 15 
=50—10-1175=39.8825 
Q,—Mean4-0.67456—504-0.6745 x 15 
72504-10.1175— 60.1175 
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Example 14.24. In a normal distribution 31% of the items are 
8 under 45 and 8% are over 64. Find the mean and standard deviation 
of the distribution. 
[Delhi Uni. B.A. (Econ. Hons.) 1978 ; 
Punjab Uni. M.A. (Econ). Oct. 1980] 


| Solution. Let Х denote the variable under consideration. 
/ Then we are given : P(X-45)—031 and P(X>64)=0:08 


X=45 XAU x=64 
02731220 Z- 72 
Fig. 14-15 
If X has a normal distribution with mean g and $.d. c, then 
the standard variables corresponding to ¥=45 and X=64 are as 
given below : 

When Х=45, 2-5-4, (ѕау). (9) 
[Note the negative sign] 


SiE (вау). (HE) 


с 
From the diagram, it is obvious that 
Р(0<2<2,)=0.42 


> Z,=1.405 (From Normal tables) 

Also Р(—2,<27<0)=0.19 

= P(0<Z<2,)=0-19 [By symmetry] 

> 21=0.496 (From Normal Tables) 

Substituting the values of z, and z, in (*) and (**) we get : 
45— р= —0.496с (09) 
64— и= 1.4055 (шй) 


Subtracting (i) from (ii) we heve : 
19=1.901с > o=10 (approx.) 

t Substituting in (i), we get : 
| =454+0.496 x 10— 45--4.96— 49.962450 (approx) 
Hence mean is 50 and s.d. is 10. 
Example 14.25, Ina certain examination the percentage of 
Li passes and distinctions were 46 and 9 respectively. Estimate the 
average marks obtained by the candidates, the minimum pass and 
distinction marks being 40 and 75 respectively. (Assume the distri- 
bution of marks to be normal.) ; 
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Solution. Letthe marks in the examination be denoted by 
the random variable X and let X be normally distributed with mean 
p ands.d.c. A candidate passes the examination if Y 2240 and 
obtains a distinction if Y 275. We are given : 

P(X240)—0.46 and ` P(X¥>75)=0.09 ...(®) 

The values of X are as shown in the following diagram, 


N 
ӘК 
о. p © 
Fig. 14-16 (а) Fig. 14:16 (5) 


From the diaz:ams it is obvious that both the values of X viz., 
40 and 75 are to the right of the mean # and hence the correspond- 
ing values of Z will be positive. 


When X¥=40, Z= Boe =21, (say) 
WhenX-75, 215—8. (uy) DeD 
Therefore, using (*) we get : 
P(Z>z,)=0.46 and P(Z > 2,)= 0.09 
=> PO0<Z<z)=0.04 and Р(0<2<2,)=0.41 


[Obvious from above diagrams] 
> 2,—0.10 and 2,=1.34 [From Normal Tables] 
Substituting in (**) we get : 

40— 150-106 С! 
75—u—1.34c se FERRY 
Subtracting (***) from (****) weet $ 
5 
35=1.246 > SEU EYE =28.22 
Substituting this value in (***) we get : 
1 40— 0:100 —40—2.822—37.1782:37.2 


EXERCISE 14.3. 


I What are the main features of Normal Probability distribution ? 
Can a normal probability distribution be fully determined if we know its mean 
and standard deviation ? [Delhi Uni. В.А. (Econ. Hons.), 1977] 


2. Write down the binomial, the Poisson and the normal probability 
functions explaining the constants. State the range of the variables in each case. 
Give one example each of bincmial, Poisson and norma! variables. 

(Bombay Uni. B.Com. Noy. 1982) 

3. State the distinctive features of Normal distribution. 

М [Delhi Uni. B.Com. (Hons.) 198Т\ 


4. State the important properties of the normal distribution. 
[Delhi Uni. B.A. (Econ. Hons. 1) (NS) 1983 ; С.А. (Intermediate), May 19831 
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5. Why is the Normal distribution most popular. 
[Delhi Uni. B. Com. (Hons.) П, 1985] 


6. Why does the Normal distribution hold the most honourable 
position in the theory of probability ? 
[Delhi Uni. B Com. (Hons.) 11, 1984] 


7. Explain the distinctive features of Binomial, Normal and Poisson . 
probability distributions. When does a Binomial distribution tend to become 
(i) a Normal and (ii) a Poisson distribution 7 Explain clearly. 

(Delhi. Uni. M. Com., 1978) 

8. 1f X is random variable following normal distribution with mean 
к and s.d. c, write its probability density function (p.d.f). Also obtain the  p.d.f. 

of standard normal variate Z=(X—p)/o. 

9. The average daily sale of 500 branch offices was Rs. 150 thousand 
and the standard deviation Rs. 15 thousand, Assuming the distribution to 
normal indicate how many branches have sales between : 

() Rs. 120 thousand and Rs. 145 thousand ? 


(ii) Rs. 140 thousand and Rs. Uu De Jesh Uni. М.В.А., 1976) 
Ans. (1) 14, qi) 295 imachal Pradesh Uni. M.8.A., 


10. In a sample of 120 workers in a factory the mean and standard 
deviation of wages were Rs. 11:35 and Rs. 3:03 respectively. Find the percentage 
of workers getting wages between Rs. 9 and Rs. 17 in the whole factory assuming 


that the wages are normally distributed. [C.A. (Intermediate), May 1981] 
Ans, 75:0995. 


(uou MH. The income of a group оѓ 10,000 persons was found to be normally 
distributed with mean Rs. 750/- per month, and standard deviation Rs. 50/-. Fiad - 
(i) the number of persoas with income less than Rs 700/- p.m., (ii) number of 
persons with income between Rs, 700/- and Rs. 800/- p.m. 

[Delhi Uni. B.A. (Econ. Hons. 1), 1983] 


Ans. (i) 1587, (ii) 6826 
12, Suppose that sizes of hats are approximately normally distributed 
with mean of 18:5 cm and a standard deviation of 2:5 cam, How тапу hats ina 
total of 2,000 will have sizes : 
(i) betwen 18 cm and 20 cm. 2 
(ii) more than 20 cm ? 
Area between 1-30 and 10:61 0:2257 and the area between r—0 and 


t=0-2 is 0:0793, where t is the standard normal variate. 
(Bombay Uni. B.Com., October 1931) 


Ans, (i) 610, (ii) 549 


13. A sample of 100 dry battery cells tested to find the length of life 
produced the following results i У a Ў 


X212 hours, o=3 hours 


Assuming the data to be normally distributed, what percentage of battery 
cells are expected to have life 


(i) more than 15 hours ? (ii) between 10 and 14 hours? and 
(iii) less than 6 hours 1 

а [L.C.W.A. (Final), June 1979] 
Ans. (i) 158796. (i) 497226. (iii) 2:28% 


14. The life time ofa certain type of battery has a mean life of 400 
hours and a standard deviation of 50 hours. Assuming normality for the distri- 
bution of life-time, find : 
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(i) the percentage of batteries which have life-time of more than 350 
hours, 


(ii) the life-time value above which the best 25 per cent of the batteries 
will have their life, and 


(iii) the proportion of batteries that have a life-time botweea 300 hours 
h 


E eu. (Shivaji Uni. B.Com., 1978) 
Ans. (i) S&1396, (ii) 43375 hours, (їй) 0:9554 


15. Suppose that a doorway being constructed is to be us 
of people whose heights are normally distributed with mean 70" 
deviation 3". How much high the doorway should be withoui 
than 259/, of the people to bump their heads ? If the height of th 
fixed at 76”, how many persons out of 5,000 are expected to bum 


Ans. 7x025” aud 114. 


ed by aclass 
and standard 
t causing more 
e door may be 
p their heads ? 


16. Incomes of a group of 10,000 persons were found to be normally 
distributed with mean Rs, 520 and standard deviation Rs, 60. 


Find: 


(i) the number of persons having income between Rs. 400 and Rs. 550, 
(ii) the lowest income of the richest 500, 


Fer a standard normal variate f, the area under the curve between (—0 
and r=0'5 is 0:19146, the area between t—0 and t=1'645 is 045000 and the 
area between /—0 and 1—2 is 0:47725. 


(Bombay Uni. B. Com., May, 1970) 
Ans. (i) 6687 (ii) Rs. 61870 


17. A wholesale distributor of a product finds that the annual demand 
for the product is normally distributed wi 


1 tha mean of 120and standard devia- 
tion of 16. If he orders only once a year, what quantity should be ordered to 
ensure that there is only a five per cent chance of running short. 


Р ose between /—0 and t=1-64 is 0:45, where t isa Standardised normal 
variate. 


(Bombay Uni. B.Com., May 1982) 

Ans. Find x’ s.t. P(X>x’)=0°05 ; x'2146:24 
18, The local authorities in a certain city instal 10,000 electric lamps in 
the streets of the city. If these lamps have an average life of 1,000 burning hours 
with standard deviation of 200 hours, assuming normality, what number of 
lamps might be expected to fail (i) in the first 800 burning hours ? (ii) between 
800 and 1,200 burning hours ? After what period of burning hours would you 


expect that (а) 10% of the lamps would fai] ? (Б) 10% of the Jamps would be 
still burning ? 


Ans. (i) 1587 (ii) 6826 
(a) Hint. Find x, s. ¢, P(X«x)50:10, x,—744 
(b) Hint. Find 925. f. P(X 7 x3)—0:10, x41256 


х 19. Ina certain examination, mean of marks scored by 400 students 
is 45 with a standard deviation of 15. Assuming the distribution to be normal, 
find (i) the number of Students securing marks between 30 and 60 ; the limits 
between which marks of the middle 50% of the students lie. 


(Area under standard normal curve between t=O and t=1 is 0:3413.) 
(Bombay Uni. B.Com., May 1980) 
Ans. (i) 273, (i) Qi=3438, Q,=55:12 
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20. The meanand standard deviation ofa normal distribution are 60 
and 5 respectively. Find the inter-quartile range and the mean deviation of the 


distribution, 
(Bombay Uni. B.Com., May, 1970) 
Ans. Qi ons xe > 0,—01=6-745 
M.D. (about mean)—4 approx. 


21. At a certain examination, 10% of the students who appeared for 
the paper in Statistics got less than 30 marks and 97% of the students got less 
than 62 marks. Assuming the distribution to be normal, find the mean and the 


iati f the distribution. 
standard deviation о (Bombay Uni. В. Com., 1978) 


Ans: ш=43°04, o=10'03 

22. Ofa large group of men, 5 per cent are under 60 inches їп height 

and 40 per cent аге between 60 шр 65 inches. Assuming a normal distribution, 
the mean height and standard deviation. 

ш р кое в=3°29 (Delhi Uni. M. Com., 1975) 


23. The following table gives frequencies of occurrence оЃа variate X 
between certain limits : 


Variate (X) Frequency (f) 
Less than 40 30 
40 or more but less than 50 33 
30 and more 37 
100 
The distribution is exactly normal. Find the average and standard devi- 
ation of y. 
[Delhi Uni. M. Com., 1974) 
Ans- 46:14, o=11°696 


24. The results of a particular examination are given below in a sum- 
mary form: 


Result % of candidates 
(i) Passed with distinction 10 
(ii) Passed without distinction 60 
(iii) Failed 30 


It is known that a candidate gets plucked if he obtains less than 40 
marks (out of 100) while he must obtain at least 75 marks in order to pass with 
distinction. Determine the mean and standard deviation of the distribution of 


marks assuming this to be normal. 
(Delhi Uni., M. Com. 1980) 
Ans. ш=50°17, a=194 
25. If X--N(p, o°), what is the value of (i) Median (ii) Mode 
(iii) Quartile deviation (iv) Mean deviation about mean ? 


2 4 
Ans. (i) и, (ii) w (ЇЙ) 067a 5-9, (iv) [2m 


26, The distribution of a variable X is given by the law : 
1 ¢x—100 \* 
sien CPC Т, 
f(x) constant x e —®<х<оо 


| 


28. 
_ of his results, 
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Write down the values of 


(i) the constant, (У) the standard deviation, 
(ii) the mean, (vi) the mean deviation and 
(iii) the median, (vii) the quartile deviation, 
(Qv) the mode, of the distribution 


1 
Ans, (i) MARET. » (di) 100, (iii) 100, (iv) 100, (у) 5, 


2 
(vi) " A x5ed, өш X5—53:33 (approx.) 
27. Criticise the following statements : 


(i) The mean of a symmetrical binomial distribution is 5 and the num- 
ber of trials is 12. 


- Gi) The mean of a Poisson distribution is 5 and the standard deviation 
153. 2 


(iii) The mean of a normal distribution is 5 and the third order central 
moment (us) is 2. 


U.C. W.A. (Final), June 1979] 
TS Ans. (i) Wrong. Data gives D=5/12, but for symmetrical binomial distt, 
р= 12. 
(i) Wrong, because for Р.Р. mean--variance. 
(iii) Wrong, because for normal distribution «0. 


A student obtained the following results, Comment on the accuracy 


(i) For a binomial distribution mean=4, variance--3, 
(ii) For a Poisson distribution, mean=10, s.d.=5 
(iil) Fora normal distribution, mean— 50, median 52 


U.C.W.A. (Final), Dec. 1980] 
Ans. (i) Correct, (ii) Wrong, (iii) Wrong. 


ee "у иуи 


eS т тинтүү Др 


Sampling Theory and Design of 
Sample Surveys 


15:1. Introduction. The science of Statistics may be broadly 
studied under the following two headings: 


(a) Descriptive, 
(b) Inductive. 


So far, (Chapters 1 to 11), we have confined the discussion to 
Descriptive Statistics which consists in describing some character- 
istics of the numerical data. The Inductive Statistics, also known as 
Statistical Inference, may be termed as the logic of drawing statisti- . 
cally valid conclusions about the totality of cases or items termed 
as population, in any statistical investigation on the basis of examin- 
inga part ofthe population, termed as sample, and which is drawn 
from the population in a scientific manner. In modern ‘decision 
making process’ in different fields of human activity, including the 
ordinary actions of our daily life, most of our decisions and atti- 
4udes depend very much upon the inspection or examination of only 
a few objects or items out of the total lot. This process of studying 
only the sample data and then generalising the results to the popu- 
lation (i.e., drawing inferences about the population on the basis of 
sample study) involves an element of risk, the risk of making wrong 
decisions. Evaluation of this risk in terms of probability is discussed 
in Chapter 16. In this chapter will shall discuss the various tech- 
niques of drawing samples from the population. 


15.2. Universe or Population. In any statistical investigation 
ihe interest usually lies in studying the various characteristics 
relating to items or individuals belonging to a particular group. 
This group of individuals under study is known as the population or 
universe. For example, if an enquiry is iatended to determine the 
average per capita income of the people ina particular city, the 
population will comprise all the earning people in that city. On the 
other hand if we want to study the expenditure habits of the fami- 
lies in that city, then the population will consist of all the house- 
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holds in that city, Further, if we want to study the quality of the 
manufactured product in an industrial concern during the day, thea 
the population will consist of the day's total production. Thus, “Tn 
Statistics, population is the aggregate of objects, animate or inani- 
mate, under study in any statistical investigation”. In sampling theory, 
the population means the larger group from which the samples are 
drawn. 


A population containing a finite number of objects or items 
is known as finite population, e.g., the students in a college, the day’s 
production in an industrial concern, the population of a city or a 
town, etc, On the other hand, a population having an infinite number 
ot objects or with the number of objects so large as to appear 
practically infinite, is termed as an infinite population, e.g., the popu- 
lation of temperatures at various points of the atmosphere; the 
population of the heights, weights or ages of the people in the 
country (cach of these variables can take any numerical value in a 
particular interval), the population of stars in the sky, etc. As will 
be seen later (Chapter 16), infinite populations are better for 
sampling studies. The population may further be classified as existent 
or hypothetical. A population consisting of concrete objects is 
known as existent population, e.g., the population of (i) the books 
in a library, (ji) the aeroplanes in the Indian Air Force, (iii) the 
Scooters in Delhi, etc. On the other hand, if the population does 
Dot consist of concrete objects, i.e., it consists of imaginary objects 
then it is called Aypothetical population. For instance, the popu- 
lations of the throws of a die or a coin, thrown infinite number of 
times are hypothetical populations. 


15.3. Sampling. A finite subset of the population, selected 
from it with the objective of investigating its properties is called a 
sample and the number of units in the sample is known as the 
sample size, Sampling is a tool which enables us to draw conclu- 
sions about the characteristics of the population after studying only 
those objects or items that are included in the sample. i 


The main objectives of the sampling theory аге: 


Ї (i) То obtain the optimum results, i.e., the maximum infor- 
mation about the characteristics of the population with the available 


Sources at our disposal in terms of time, money and manpower by 
studying the sample vaiues only. 


(i) To obtain the best possible estimates of the population 
parameters. [See $ 15.4]. 


Although the scientific development of the theory of sampling 
has taken place only during the last few decades, the idea of sampl- 
ingis very old. From times immemorial, people have been using 
it wnhout knowing that some scientific procedure has been used in 
arriving at the conclusions. On inspecting the sample of a particular 
stufi, we arrive at a conclusion about accepting or rejecting it. For 
example, the consumer examines only a handful of the rice, pulses 
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or any commodity in a shop to assess its qualityand then decides to 
buy it or not. The housewife, usually tastes a spoonful of the 
cooked products to ascertain if it is properly cooked and also to see 
if it contains proper quantity of salt or sugar. The consumer 
ascertains the quality of the grapes by testing one or two from the 
seller's basket. The intelligence of the individuals in a subject is 
estimated by the university by giving them a 3-hour test. A business- 
man orders for the products after examining only a sample from it. 
In fact, the entire business is done on the basis of display of a few 
specimen samples only. The error involved in approximations 
about the population characteristics on the basis of the sample is 
known as sampling error and is inherent and unavoidable in any 
sampling scheme. 


15.4. Parameter and Statistic. The statistical constants of 
the population like mean (и), variance (в?) skewness (21), kurtosis 
(8,), moments (+), correlation coefficient (е), etc. are known as 
parameters. We can compute similar statistical constants for the 
sample drawn from the given population. Prof. R.A. Fisher termed 
the statistical constants of the sample like mean (X), variance (5°), 
skewness (Ру), kurtosis (Б), moments (тг), correlation coefficient (r), 
etc., as statistics. Obviously, parameters are functions of the 
population values while statistics are functions of the sample 
observations. 

Let us consider a finite population of N units and let y,,y,..... yw 
be the observations on the N units in the population. Suppose we 
draw a sample of size n from this population. Let x, xs,..-,Xa be the 
observations on the sample units. Then we have, by definition : 


N 
p= cbe omg 2^ (051) 
Ё 
а= LL ооч onu | 
М, 
-4+5 (н)? (15.2) 


i=1 
The sample mean (X) and variance (s?) are given Бу: 
z= Qa xt хә) 


-15 Xi 015.3) 


i=l 


se. [ e eme] 


- em Ga—xy* 2015.4) 
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Generally, the population parameters are unknown and their 
estimates provided by the appropriate sample statistics are used. 
Obviously, the sample statistics are functions of the sample observa- 
tions and vary from sample to sample. Thus, if t is any general 


statistic, then we may write / as a function of the sample observa 
tions ху, 23, ..., Xn as given below: 


19 Xs Xn) (15.5) 


Remark. A statistic /—1(x,,25,...,x4) is said to be an unbiased 
estimate of the population parameter 0 if E(t)=8. In other words, if 


K(Statistic) - Parameter, + (15.5a) 
then the statistic is said to be an unbiased estimate of the parameter. 


1541 Sampling Distribution. If we draw a sample of size n 
froma given finite population of size N, then the total number of 
possible samples is : 


N! 
NC, WINES 1k, (say). 


For each of these k 


t k samples we can compute some statistic 
t= (11, 05, -.-, Xn), in partic 


z ular the mean 7, the variance s*,etc., as 
given below : 


Sample Number 
1 


"The set of the values of the statistic so obtained, one for each 
sample, constitutes what is called the sampling distribution of the 
statistic. For example, the values tis te, fa,.-., te determine the 
sampling distribution of the statistic z. In other words, statistic f 
may be regarded asa random variable which can take the values 
ty, to 135, t; and we ean compute the various statistical constants 
like mean, variance, skewness, kurtosis, etc., for its distribution. For 


example, the mean and variance of the sampling distribution of the 
Statistic ¢ are given by : 


k 
1 1 
t =g litte be +H) D » (15.6) 


i=] 
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dX т) 2015.7) 


15.4.2. Standard Error. Тһе, standard deviation of the 
sampling distribution of a statistic is known as its Standard Error 
(S-E.). Thus, the Standard Error of the statistic t is given by : 


$E Var) 


- [= Bie ту] (15.8) 


i=} 


In particular, the S.E. of the sampling distribution of фе 
„йк. 


mean x is given by the standard deviation of the values Fi, ©», 


Standard Error 


ОНЫ erm 
Sample mean : (3) oj n 
Observed sample proportion V 4 PQin 
Sample standard deviation (5) ji в?]2п 
i 
st ] суп. 
Quartiles ; 1:362636/ A1 
Median 1253310] v/n 
| е". sample correlation coefficient (1—685/ vn, 
p being the НЕА correlation 
coefficient 
us | ОЛ 
vu | ot ¥96)n 
2 | y ive 


Coefficient of variation (Y) ant lior = Vi 
| Difference of two means: (1—39) 


Difference of two s.d's: (5—52 ` 


Difference of two proportions ; 
(pi—23) 
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The derivation of the standard errors of the sampling distri- 
butions of various statistics is quite difficult and beyond the scope 
ofthe book. The standard errors of the sampling distributions of 
some of the well-known statistics, where п is the sample size, c is 
the population standard deviation, P is the population proportion 
and Q=1—P, n, and n; represent the sizes of two samples respecti- 
vely, is given in the Table on page 796 - 


The above formulae for the standard errors are obtained in 
random sampling from an infinite population or from a very large 
population so that the sample size n is relatively very small as com- 
pares with the population size Nand consequently n/N is neg- 

ected. 


Remark 1. The reciprocal of the S.E. ofa statistic gives a mea- 
sure of the precision or reliability of the estimate of the parameter. 


15.5. Principles of Sampling. The fact that the characteristics 
of the sample (sample statistics) provide a fuirly gocd idea about 
the population characteristics (population parameters) is borne out 
by the theory of probability. We discuss below some important 
Jaws which form the basis of the sampling theory. 


15.5.1. Law of Statistical Regularity. This law has its origin 
inthe mathematical theory of probability. In the words of L.R. 
Conner, ‘The law of statistical regularity lays down that a group of 
objects chosen at random fiom a larger grovp tends to possess the 
characteristics of that large group (universe)”. According to King 
“tihe law of statistical regularity lays down that the moderately large 
number of items chosen at random from a large group are almost sure 
on the average to possess the characteristics of the large group. 


The principle of statistical regularity impresses upon the 
following two points : 


(i) Large sample size. Logically, it seems that as the sample size 
increases, the sample is more likely to reveal the true characteristics of 
the population and thus provide better estimates of the parameters. 
It is known that the reliability of the sample statistic asan estimate 
of the population parameter is proportional to the square root of the 
sample size п. But due to certain limitations in terms of time, money 
and manpower, it is not always possible to take very large samples. 
Moreover, the effort and cost of drawing large samples might outlive 
the utility of the sample study as against the complete enumeration 
(census). 


(ii) Random selection. The sample should be selected at 
random from the population. By random selection we mean a 
selection in which each and every unit in the population has an 
equal chance of being selected in the sample. 


Ifa sample is selected such that the above two conditions are 
statisfied, then it will depict the true characteristics of the population 
fairly accurately and can be used for drawing valid inferences 


———————— BÉ 


кх — 
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about the population. For example, if we are interested in studying 
the average height of the students in Delhi University, then it is not 
desirable to resort to 100% enumeration of the students in the upi- 
versity. A fairly adequate sample of the students from each college 
may be selected at random and the average height of the students 
selected in the samples may be computed. Since the sample is 
random, it would be representative of the population and the aver- 
age so obtained will not differ much from the true value (7.e., the 
average computed by the complete enumeration). This difference 
is attributed to fluctuations of sampling. [For detailed discussion 
of drawing random samples, see Simple Random Sampling § 15-11 
and Stratified Random Sampling § 15.13]. 


155.2. Principle of Inertia of Large Numbers. An immediate 
deduction from the Principle of Statistical Regularity is the Principle 
of Inertia of Large Numbers which states, “Osher things being equal, 
as the sample size increases, the results tend to be more reliable and 
accurate”. This is based on the fact that the behaviour of a 
phenomenon en mosse. i.e., on à large scale is generally stable. By 
this we mean that if individual events are observed, their behaviour 
may be erratic and unpredictaable but when a large number of events 
are considered they tend to behave ina stable pattern. This is 
because a number of forces operate on the given phenomenon and 
if the units are large, then the typical odd variations in one part of 
the universe in one direction will get neutralised by the variations 
in equally bigger part of the universe in the other direction. 
According to A.L. Bowley, "Great numbers and averages resulting 
from them, such as we always obtain in measuring social phenomena, 
have a great inertia.’ Thus in dealing with large numbers, the 
variations in the component parts tend to balance each other and 
consequently the variation in the aggregate result is likely to be 
insignificant. However, it should not be inferred that in case of 
large numbers, there is no variation at all. Large numbers are 
relatively more stable in their characteristics than the small numbers. 
They (large numbers) also exhibit variations but they are of very 
smail magnitude and intensity and are not violent in nature. For 
example, if a coin is tossed, say, 20 times then nothing can be said 
with certainty about the proportion of heads. We may get 0, 1, 2..., 
or even all the 20 heads. But if it is thrown at random a very large 
number of times, say, 5,000 times. then we may expect on the 
average 50% heads and 50% tails. As another illustration let us 
consider the production of a particular commodity, say, rice in two 
districts in a state for anumber of years, The figures will show 
great variations due to favourable or unfavourable conditions in 
that particular region. However, the figures for the production of 
rice for the whole state over number of years will show relatively 
lesser variations because lower production in some of districts will 
be compensated by the excessive production in some other districts 
of the state. Arguing similarly we find that the production figures 
for the whole of India will show still lesser variation and for the 
entire world it would be more or less stable. 
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15.5.3. Principle of Persistence of Smali Numbers. If somo 
of the items in a population possess markedly distinct characteristics 
from the remaining items then this tendency would be revealed in 
the sample values also. Rather this tendency of persistence will be 
there even if the population size is increased or even in the case of 

` large samples. For example, if the day's production of any manu- 
facturing unit is made 4 times, the proportion of defectives in the 
lotremains more orless same. This means that the number of 
defectives in the lot will also increase more or less in the same 
proportion, Similarly, if a random sample of size п гот the lot 
gives a fraction aefective, 


1 " em ^ 
Рея x (Number of defectives in the sample), 


and if the sample size is doubled or trebled, the fraction defective 
will more or Jess remain same. 


15.5.4. Principle of Validity. A sampling design is termed as 
valid if it enables us to obtain valid tests and estimates about the 
population parameters. This principle is satisfied by the samples 
St the technique of probability sampling, discussed in $ 
715.952; 

15.5.5. Principle of Optimisation. This principle stresses the 
need of obtaining optimum results in terms of efficiency and cost of 
the sampling design with the sources available at our disposal. As 
has been pointed out earlier, a measure of efficiency or reliability 
of an estimate of the population parameter is provided by the reci- 
procal of the standard error of the estimate and the cost of the 
design is determined by the total expenses incurred in terms of 
money and manpower. This principle aims at : 


(i) obtaining a desired level of efficiency at minimum cost and 


(ii) obtaining maximum possible efficiency with given level of 


cost. >» 


15.6. Census Versus Sample Enumeration. For any satistical 
enquiry in any field of human activity, whether it is in business, 
economics or social sciences, the basic problem is to obtain 
adequate and reliable data relating to the particular phenomenon 
under study. There are two methods of collecting the data : 


(i) The Census Method or Complete Enumeration. 
(ii) The Sample Method or Partial Enumeration. 


Census Method. In the census method we resort to 100% 
inspection of the population and enumerate each and every unit of 
the population. Іп the sample method we inspect only a selected 
representative and adequate fraction (finite subset) of the population 
and after analysing the results of the sample data we draw conclu- 
sions about the characteristics of the population. 


A The census method seems to provide more accurate and exact 
information as compared to sample enumeration as the information 
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is collected from each and every unit of the population. Moreover, 
it affords more extensive and detailed study. For instance, the 
population census conducted by the Government of India every 10 
years collects information not only about the population but also 
Obtains data relating to age, marital status, occupation, religion, 
education, employment, income, property, etc. The census method 
has its obvious limitations and drawbacks given below : 


` (i) The complete enumeration of the population requires lot 
of time, money, manpower and administrative personnel. As such 
this method can be adopted only by the government and big organi- 
sations who have vast resources at their disposal. 


(ii) Since the entíre population is to be enumerated, the census 
method is usually very time consuming. If the population 13 suffi- : 
ciently large, then it is possible that the processing and the analysis 
of the data might take so much time that when the results are avail- 
able they are not of much use because of changed conditions. 


Remark. Wien to use Census Method? Census method is 
recommended in the following situations : 

(a) If the information is required about each апа every unit 
of the population, there is no way but to resort to 100% enumera- 
tion. 


. - (b) In any manufacturing process in industry, 10095 enumera- 
tion should be taken recourse to under the following conditions : 


(i) The occurrence of a defect may cause loss of life or serious 
casualty to personnel. 


(ii) A defect may cause serious malfunction of the equip- 
ment. 


It may also be desirable to carry out complete census if, 
(i) N, the lot size is small and 
(ii) the incoming lot quality is poor or unknown. 


* 


Sample Method. The sample method has a number of 
distinct advantages over the complete enumeration method. Prof. 
R.A. Fisher sums up tbe advantages of sampling techniques over 
complete census in just four words : Speed, Economy, Adaptability 
and Scientific Approach. ^ properly designed and carefully executed 
sampling plan yields fairly good results, often better than those 
obtained by the census method. We summarise below the merits of 
the sample method over the census method : 


1. Speed, i.e., less time. Since only a part of. the population 
is to be inspected and examined, the sample method results in. con- 
siderable amount of saving in time and labour. There is saving in 
time not only in conducting the sampling enquiry but also in the 
processing, editing and analysing the data. This is a very sensitive 
and important point for the statistical investigations where the 
results are urgently and quickly needed. 
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2. Economy, i.e., Reduced Cost of the Enquiry. The sample 
method is much more economical than a complete census. In a 
sample enquiry, there is reduction in the cost of collection of the 
information, administration, transport, training and man hours. 
Although, the labour and the expenses of obtaining information per 
unit are generally large ina sample enquiry than in the census 
method, the overall expenses of a sample survey are relatively much 
less, since only a fraction of the population is to be enumerated. 
This is particularly significant in conducting socio-economic surveys 
in developing countries with budding economies whó cannot afford 
a complete census because of lack of finances. 


3. Administrative Convenience. A complete census requires 
a very huge administrative set up involving lot of personnel, trained 
investigators and above all the co-ordination between the various 
operating agencies. On the other hand, the organisation and ad- 
ministration of a sample survey is relatively much convenient as it 
requires less personnel staff and the field of enquiry is also limited. 


4. Reliability. In the census, the sampling errors are com- 
pletely absent. If the non-sampling errors are also absent, the results 
would be 100% accurate. [For sampling and non-sampling errors 
see 5 15.9.] On the other hand, a sample enquiry contains both 
sampling and non-sampling errors. In spite of this weakness, a 
carefully designed and scientifically executed sample survey gives 
results which are more reliable than those obtained from a complete 
census. This is because of the following reasons : 


ti) It is always Possible to ascertain the extent of sampling 
error and degree of reliability of the results, Even the desired 


Fr d of accuracy can be achieved through sampling using different 
ices. 


| (i) The non-sam 
recording observations 
tion, location of units, 


pling errors such as due to measuring and 
» inaccuracy or incompleteness of informa- 
i n non-response or incomplete response, train- 
ing of investigators, interpretation of Ден. bias of етае 
а: etc., are of a more serious nature in а complete census. Ina 
nti oro these errors can be effectively controlied and mini- 
ES Gi) ERO highly qualified, skilled and trained person- 
condat) ae ing adequate training to the investigators for 

ng the enquiry, (iii) better supervision, (iv) using more 


sophisticated equipment and statistical techni i 
е niques for the рг 
and analysis of the relatively limited data. d Rcg 


up work in ar of non-response or incomplete 
i y undertaken in a s i 
census. The effective red р Dee AE 
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(i) Approximations in measuremetits, e.g., the heights of individuals 

ау be approximated to 10th of a centimeter, age may be measured 

orrect to nearest month, weight may be measured correct to 10th 

of a kilogram, distance may be measured correct to the nearest metre 

and soon. Thus, in alf such measurements, there is bound to be a 
difference between the observed value and the true value. 


(ii) Approximations in rounding of the figures to the nearest 
hundreds, thousands, millions, etc., or in the rounding of decimals, 


(iii) The biases due to faulty collection and analysis of the data 
and biases in the presentation and interpretation of the results. 


(iv) Personal biases of the investigators, and so on. 


In any statistical investigation, these errors i.e., the discre- 
pancies between the estimated and the actual values are the net 
effect ofa multiplicity of factors and can be broadly classified into 
two groups discussed below. 


15.8.1. Sampling and Non-Sampling Errors. The inaccuracies 
or errors in any statistical investigation, ie., in the collection, 
processing, analysis and interpretation of the data may be broadly 
classified as follows : 

(i) Sampling Errors and (ii) Non-Sampling Errors. 


Sampling Errors. In a sample survey, since only a small 
portion of the population is studied its results are bound to differ 
from the census results and thus bave a certain amount of error, 
This error would always be there no matter that the sample is drawn 
at random and that it is highly representative. This error is attri- 
buted to fluctuations of sampling and is called sampling error. 
Sampliag error is due to the fact that only a subset of the population 
(i.e., sample) has been used to estimate the population parameters 
and draw inferences about the population. Thus, sampling error is 
present only in a sample survey and is completely absent in census 
method. 


Sampling errors are primarily due to the following reasons : 


1. Faulty selection of the sample. Some of the bias is introduc- 
ed by the use of defective sampling technique for the selection of a 
sample, e.g., purposive or judgment sampling in which the investi- 
gator deliberately selects a representative sample to obtain certain 
results. This bias can be overcome by strictly adhering to a simple 
random sample or by selecting. a sample at random subject to 
restrictions which while improving the accuracy are of such nature 
that they do not introduce bias in the results. 


2. Substitution. If difficulties arise in enu rating a particular 
sampling unit included in the random Sample, the investigators 
usually substitute a convenient member of the population. This 
obviously leads to some bias since the characteristics possessed by 

^ 


804 Business Statistics 


the substituted unit will usually be different from those possessed by 
the unit originally included in the sample. 


3 Faulty demarcation of sampling units. Bias due to defective 
demarcation of sampling units is particularly significant in area 
surveys such as agricultural experiments in the field or crop cutting 
surveys, etc. In such surveys, while dealing with border line cases, it 
depends more or less on the discretion of the investigator whether 
to include them in the sample or not. 


4. Error due tobias inthe estimation method. Sampling 
method consists in estimating the parameters of the population by 
appropriate statistics computed from the sample. Improper choice 
of the estimation techniques might introduce the error. For example, 
in simple random sampling, if X1, x», ---, Xn are observations cn the 
n sampled units, then the sample variance 


п 
2 
#2 Ly (x7) 
isl 


is a biased estimator of the population variance c? while an unbiased 


estimate of c* is given by 
п 
1 ES 
S s) 


i=l 
5; Veriability of the population. Sampling error also depends 
on the variability or heterogeneity of the population to be sampled. 


Si 


Remark 1. A measure of the sampling error is provided by the 
"standard error of the estimate. The knowledge and estimation of 
the sampling error reduces the clement of uncertainty. The reliabi- 
lity or efficiency of a sampling plan is determined by the reciprocal 
of the standard error of the estimate and is called the precision of the 
estimate Ina sample survey attempt is made to minimise this 
sampling error which is same as increasing the precision of the esti- 
mate, In most of the situations it has been observed that standard 
error of the estimate is inversely proportional to the square root 
of the i sample size. [See $ 15.4.2 page 796 ] In other words the 
reliability or efficiency of the sampling design, which is thus directly 
proportional to the square root of the sample size, can be increased 
by taking large samples as shown in the following diagram. However. 
the sample size can be increased only up to certain limits keeping 
in view the time and money factors at our disposal, otherwise the 
very purpose of the sample survey will be defeated. 
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SAMPLING ERROR 


SAMPLE SIZE 


Fig. 15:1. 


Non-Sampling Errors. Non-sampling errors are not attri- 
buted to chance and are a consequence of certain factors which are 
within human control. In other words, they are due to certain 
causes which can be traced and may arise at any stage of the enquiry, 
viz., planning and execution of the survey and collection, processing 
and analysis of the data. Non-sampling errors are thus present 
both in census surveys as well as sample surveys. Obviously, non- 
sampling errors will be of large magnitude in a census survey than 
ina sample survey because they inorease with the increase in the 
number of units to be examined and enumerated. It is very difficult 
| to prepare an exhaustive list ofall the sources of non-sampling 
errors. We enumerate below some of the important factors res- 


EEE! „лур ee ЗГТ 


ponsible for non-sampling errors in any survey (census or sample), 


1. Faulty planning, including vague and faulty definitions of 
the population or the statistical units to be used, incomplete list of 
population-members (j.e., incomplete frame in case of sample 
survey). 


2. Vague and imperfect questionnaire which might result in 
incomplete or wrong information. 

3. Defective methods of. interviewing and asking questions. 

4. Vagueness about the type of the data to be collected. 


5. Exaggerated or wrong answers to the questions which 
appeal to the pride or prestige or self-interest of the respondents. 
For «example, a person may overstate his education or income or 
understate his age or he may give wrong statements to safeguard his 
self-interest. = 


6- Personal bias of the investigator. 


7. Lack of trained and qualified investigators and lack of 
supervisory staff. 


8. Failure of respondents memory to recall the events or _ 
‘happenings in the past. А 
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9. Non-response and Inadequate or Incompleté Response. Bias 
due to non-response results if in a house-to-house survey the respon- 
dent is not available in spite of repeated visits by the investigator 
or if the respondent refuses to furnish the information. Incomplete 
tesponse error is introduced if the respondent is unable to furnish 
information on certain questions or if he is unwilling or even refuses 
to answer certain questions. 


10. Improper Coverage. If the objectives of the survey are 
not precisely stated in clear cut terms this may result in (i) the 
inclusion in the survey of certain units which are not to be included, 
or (ii) the exclusion of certain units which were to be included in 

the survey under the objectives. For example, in a census to deter- 
mine the number of individuals in the age group, say, 15 years to 
55 years, more or less serious errors may occur in deciding whom 
to enumerate unless particular community or area is not specified 
and also the time at which the age is to be specified. 


11. Compiling Errors, i.e., wrong calculations or entries made 
during the processing and analysis of the data. Various operations 
of data processing such as editing and coding of the responses, 
punching of cards, tabulation and summarising the original observa- 
tions made in the survey are a potential source of error. Compila- 


tion errors are subject to control through verification, consistency 
checks, etc. 


12. Publication Errors. Publication errors, ie., the errors 
committed during presentation and printing of tabulated results are 
basically due to two sources. The first refers to the mechanics of 
publication – the proofing error and the like. The other, which is of 
a more serious nature, lies in the failure of the survey organisation 
to point out the limitations of the statistics. 


Remark. Іп а census, sampling error is completely absent so 
that the total error is non-sampling error. A sample survey, on the 
other hand, contains both sampling and non-sampling errors. AS 
pointed out carlier, in a sample survey non-sampling error can be 
effectively controlled Ьу: 


(i) Employing qualified and trained personnel for the planning 
and execution of the survey ; 


(ii) Using more ‘Sophisticated statistical techniques and equip- 
ment for the processing and analysis of the data. 
(iii) Providing adequate supervisory checks on the field work. 
(iv) Pretesting or conducting a pilot survey. 
(v) Thorough editing and scrutiny of the results. 


(vi) Effective checking of all the steps in the processing and 
analysis of the data. 


(vii) More effective follow up of non-response cases. 


(viii) Imparting thorough training to the investigators for 
efficient conduct of the enquiry. 


= 
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Moreover, the sampling error in a sample survey can be 
minimised by taking an adequately large sample selected by an 
appropriate sampling plan. The selection of the sample by ‘Pro- 
bability Sampling’ such as Simple Random Sampling, Stratified 
Random Sampling, Systematic Sampling, etc:, [See $ 15.9.2; $15.10; 
8 15.11, § 15.12], usually gives quite reliable results. In practice the 
use of simple random sampling with suitable adaption of stratifica- 
ation of the universe if it is heterogeneous, or the technique of 
multistage random sampling if there are clearly demarcated stages 
gives fairly good results, often better than those given by a complete 


census. 


15.8.2. Biased and Unbiased Errors. In any statistical investi- 
gation whether a complete census or a sample survey, the statistical 
errors can also be classified as : 


(i) Biased Errors and Unbiased Errors. 


Biased Errors. Biased errors creep in because of : 


(i) Bias on the part of the enumerator or investigator whose 
personal beliefs and prejudices are likely to affect the results of the 
enquiry. 

(ii) Bias in the measuring instrument or the equipment used 
for recording the observations. 


(iii) Bias due to faulty collection of the data and in the statis- 
tical techniques and the formulae used for the analysis of the 


data. 


(iv) Respondents’ bias. An appeal to the pride or prestige of 

' an individual introduces a bias called prestige bias by virtue of 

which he may upgrade his education, occupation, income. etc., or 

understate his age, thus resulting in wrong answers. Moreover, 

respondents may furnish wrong information to safeguard their 

personal interests. For example, for income-tax purposes a person 
may give an understatement of his salary or income or assets. 


(у) Bias due to Non-response. [See item 9, Non-sampling 
Errors, page 807.] 


(vi) Bias in the Technique of Approximations. If, while round- 
ing off, each individual value is either approximated to next highest 
or lowest number so that all the errors move in the same direction, 
there is bias for overstatement or understatement respectively For 
example, if the figures are to be rounded off to the next highest or 
lowest hundred then each of the values 305 and 396 will be record- 
ed as 400 and 300 respectively. 


Owing to their nature, the biased errors have a tendency to 
grow in magnitude with an increase in the number of the observa- 
tions and hence are also known as Cumulative Errors. Thus, the 
magnitude of the biased errors is directly proportional to the number 
of observations. к 


Й 
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Unbiased Errors. The errors are termed as unbiased errors 
ifthe estimated or approximated values are likely to err on either 
side, i.e., if the chances of making an over-estimate is almost same 
as the chance of making an under-estimate. Since these errors 
move in both the directions, the- errors in one direction are more 
or less neutralised by the errors in the opposite direction and con- 
sequently the ultimate result is not much affected. For example, if 
the individual values, say, 385, 415, 355 445 are rounded to the 
nearest comnlete unit, i-e., hundred, each one of them would be 
recorded as 400. In this case the values 385 and 355 give over- 
estimating errors of magnitudes 15 and 45 respectively while the 
walues 415 and 445 give under-estimating errors of magnitudes 15 
and 45 respectively and in the ultimate result (approximation) they 
get neutralised. Thus, if the number of observations is quite large, 
these unbiased errors will not affect the final result much. Since 
the errors in one direction compensate for the errors in the other 
direction, unbiased еггог5 аге also termed as Compensatory Errors. 
Thus we observe that the unbiased errors do not grow with the 
increase in the number of observations but they have a tendency to 
get neutralised and are minimum in the ultimate analysis and the 
magnitude of the unbiased errors is inversely proportional to the 
number: of items. 


15.8.3. Measures of Statistical Errors (Absolute and Relative 
Errors). A measure of the statistical errors is provided by absolute 
and relative errors. 


Absolute Error. An absolute error (A.E.) is the difference 
between the true value of any particular observed item or variable 
and its estimated or approximated value. Symbolically, we may 


write : AE- |a—e|, (15:9) 


where a is the actual value and e is the estimated value. and a—e 
represents the modulus value of (a— e) after ignoring the negative 
sign. For example, if'a value 54,87,350 is approximated to the 
nearest lakh, it'can be taken as 55 lakhs Thus 


AE— | a—e | — | 54,87,350—55,00,000 | 
= | —12,650 | —12,650 


, The magnitude of the absolute error is quite independent of 
the magnitude of the actual value. For example, in the above case 
AE remains same for all those values which have the digit in the 

{ 10 thousand place greater than 5. Consequently, absolute errors 
cannot be compared meaningfully. For example, the above error of 
12,650 in a value of 54,87,350 may be quite insignificant or im- 
material as compared with an absolute error of 10 in a value of 
500. Moreover, these érrors are in the units of measurement and 
as such AEs in different units can’t be compared meaningfully. In 
order to facilitate comparison: of the errors, they are reduced to 
pure numbers which are independent of the units of measurement. 
This is done by calculating the Relative Error (R.E.) which is defined 
as the ratio of the absolute error to the actual value, Symbolically, 

Lco SLE c dee 
BE ONCE Dl ош 
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Thus in the above example, RE which relates the magnitude 
of the error to the magnitude of the true value, is given by 


12,650 
КЕ $487,350 
The relative error may also be expressed as percentage. 


12,650 
54,87,350 


In statistical analysis, relative error is a much more useful 
measure than the absolute error as it provides а useful coefficient, 
(a Pure number independent of units of measurement), for compar- 
ing the degree of the error of different sets of data. 

15.9. Types of Sampling. The choice of an appropriate 
sampling design is of paramount importance in the execution of a 
samplc survey and is generally made keeping in view the objectives 
and scope of the enquiry and the type of the universe to be sampled. 
The sampling techniques may be broadly classified as follows : 

(i) Purposive or Subjective or Judgment Sampling. 

(ii) Probability Sampling. 
(iii) Mixed Sampling. 

15.9 .1. Purposive or Subjective or Judgment Sampling. In 
this method, a desired number of sample units is selected delibe- 
rately or purposely depending upon the object of the enquiry so 
that only the important items representing the true characteristics 
of the population are included in the sample. 

An obvious and serious drawback of this sampling scheme is 
that it is highly subjective in nature since the selection of the sample 
depends entirely on the personal convenience, beliefs, biases and 
prejudices of the investigator. For example, if ina socio-economic 
survey it is desired to study the standard of living of the people in 
New Delhi and if thé investigator wants to show that the standard 
has gone down then he may include individuals in the samples only 
from the low income stratum of the society and exclude the people 
from the posh colonies like South Extension, Greater Kailash, Jor 
Bagh, Chanakyapuri and so on. This method cannot be worked out 
for large samples and is expected to give good results in small 
samples only provided the selection of the sample is representative. 
This can be achieved if the investigator is thoroughly skilled and 
experienced in the field of enquiry and knows the limitations of 
such a' selection. Further, since this scheme does not involve the 
principle of probability, estimation of the sampling error depends 
upon the hypothesis which are rarely met in practice. 


15.9.2. Probability Sampling. Probability sampling pro- 
vides a scientific technique of drawing samples from the population 
according to some laws of chance in which each unit in the universe 
has some definite pre-assigned probability of being selected in the 
sample. Different types of sampling are in which : 


(i) Eaeh sample unit has an equal chance of being selected. ` 


70.0023 


Percentage RE= x 100=0.23 
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(ii) Sampling units have varying probability of being selected. 


(iii) Probability of selection of a unit is proportional to the 
sample size. 


15. 9.3. Mixed Sampling. Sampling design in which the 
sample units are selected partly according to some probability laws, 
[given in $ 15 .9.2 (i), (ii), (iii)] and partly according to a fixed 
Sampling rule (no use of chance), is known as Mixed Sampling. 

Some of the important types of sampling schemes covered by 
§ 15. 9.2 and 15.9.3 are given below : 


(i) Simple Random Sampling 
(ii) Stratified Random Sampling 
(iii) Systematic Sampling 
(iv) Multistage Sampling 
(v) Quasi Random Sampling 
(vi) Area Sampling 
(vii) Simple Cluster Sampling 
(viii) Multistage Cluster Sampling 
(ix) Quota Sampling. 


We shall discuss below some plans briefly. 

Remark, The selection of the sample based on the theory 
of probability is also known as random selection and sometimes the 
probability sampling is also called Random Sampling. It should be 
borne in mind that in ordinary language randomness means hapha- 
zardness or without any purpose or definite law but in Statistics 
randomness isa well defined concept. According to Simpson and 
Kafka, “Random samples are characterised by the way in which they 
xm selected. Randomness is not usedin the sense of haphazard or 

it or miss’. 


15.11. Simple Random Sampling. (S.R.S.) Simple random 
sampling is the technique in which “‘sample is so drawn that each and 


every unit in the population has an equal and independent chance of 
being included in the sample. 


. If the unit selected in any draw is not replaced in the popu- 
lation before making the next draw, then it is known as simple 
random sampling without replacement (srswor) and if it is replaced 
back before making the next draw, then the sampling plan is called 
simple random sampling with replacement (srswr). Thus, simple 
random Sampling with replacement always amounts to sampling 
from an infinite population, even though the population is finite. 


Remark .Alternative Definition of izeni 
: Srswor. If a sample of 
drawn without replacement from a Population of ize N then thes 


ple Random Sampling is the technique 


of selecting tbe sample so that each of these “Cn samples has an 


equal chance or probability 
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ms 
PNG (15.11) 


of being selected in the sample. 


If sampling is done with replacement, then there are N" 
possible samples of size n. In this case simple random sampling 
(srswr) gives equal chance 


1 
PENES (15.12) 


for each of the N” samples to be selected. 


15.10.1. Selection of a Simple Random Sample. Proper care 
musí be exercised to ensure that the sample drawn is random and 
therefare representative of the population. A random sample 
may be selected by : 


(i) Lottery Method. 
(ii) Use of Table of Random Numbers. 


Lottery Meíhod. The simplest method of drawing a ran- 
dom sample is the lottery system. This consists in identifying each 
and every member or unit of the population with a distinct number 
which is recorded on a slip or a card. These slips should be as homo- 
geneous as possible in shape, size, colour, etc., to avoid the human 
bias. The lot of these slips or cards is a kind of miniature of the 
population for sampling purposes. If the population is small, then 
these slips are put in a bag and thoroughly shuffled and then as 
many slips as units needed in the sample are drawn one by one, the 
slips being thoroughly shuffled after each draw. The sampling units 
corresponding to the numbers on the selected slips will constitute 
arandom sample. For example, let us suppose that we want to 
draw a random sample of 10 individuals from a population of 100 
individuals. We assign the numbers 1 to 100, one number to each 
individual of the population and prepare 100 identical slips bearing 
the numbers from 1 to 100. These slips are then placed in a bag or 
container and shuffled thoroughly. Finally, a sample of 10 slips is 
drawn out one by one. The individuals bearing the numbers on 
these selected slips will constitute the desired sample. 


If the population to be sampled is fairly large then we may 
adopt the lottery method in which all the slips or cards are placed 
in a metal cylinder which is thrown into a large rotating drum work- 
ing under a mechanical system. The rotation of the drum results in 
thorough mixing or randomisation of the cards, Then a sample of 
desired size n is drawn out of the container mechanically and the 
corresponding z sample units constitute the desired random sample. 


The lottery method gives a sample which is quite independent 
of the properties of the population. It is one of the best and most 
commonly used methods of selecting random samples. It is quite 
freguentiy used in the random draw of prizes, in the Tambola games 
and só on. 
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Remark. In sampling with replacement (stswr) each card 
tirawn is replaced back in the container before making the next draw. 
But in sampling without replacement (srswor) cards once drawn are 
not returned back. Since cards are drawn one by one, a thorough 
mixing is required before the next draw. 


Use of Table of Random Numbers. The lottery method 
described above is quite time consuming and cumbersome to use if 
‘the population to be sampled is sufficiently large. Moreover, in this 
method it isnot humanly possible to make all the slips or cards 
exactly alike and as such some bias is likely to be introduced. Statis- 
ticians have avoided this difficulty by considering the random sampl- 
ing number series, Most of these series are the results of actual sam- 
pling operations recorded for future use. The most practical and 
m-expensive method of selecting a random sample consists in the use 
of ‘Random Number Tables’, which have been so constructed that 
each of the digits 0, 1, 2,..., 9 appears with approximately the same 
frequency and independently of cach other. If we have to select a 
sample from a population of size N(<99) then the numbers can be 
combined two by two to give pairs from 00 to 99. Similarly if 
NS<999 or N<9999 and so on, then combining the digits three by 
three (or four by four and so on), we get numbers from 000 to 999 or 
0000 to 9999 and so on. Since each of the digits 0, 1, 2, ..., 9 occurs 
with approximately the same frequency and independently of each 
other, so does each of the pairs 00 to 99, triplets 000 to 999 or 
quadruplets 0000 to 9999 and so on. 


The method of drawing a random sample comprises the follow- 
ing steps : 


(i) Identify N units in the population with the numbers 1 to N. 


(ii) Select at random, any page of the ‘random number table’ 
and pick up the numbers in any row, column or diagonal at random. 


. (iii) The population units corresponding to the numbers select- 
ed in step (ii) constitute the random sample. 


We give below different sets of random numbers commonly 
used in practice. The numbers in these tables have been subjected 
to various statistical tests for randomness of a series and their ran- 
domness has been well established for all practical purposes. 


1  Tippet's (1927) Random Number Tables. (Tracts for com- 
puters No. 15, Cambridge University Press). 


.. Tippet number tables consist of 10,400 four-digited numbers, 
giving inall 10,400X4, ie., 41,600 digits selected at random from 
the British census reports. 


2. Fisher and Yates (1938) Tables (n Statistical Tables for 
Biological, Agricultural and Medical Research) comprise 15,000 digits 
arranged in twos. Fisher and Yates obtained these tables by drawing 
numbers at random from the 10th to 19th digits of A.S. Thomson's 
20-figure logarithmic tables. 
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33 Kendall and Babington Smiths (1939) random tables consist 
of 100,000 digits grouped into 25,000 sets of 4-digited random num- 
bers (Tracts for computers, No. 24, Cambridge University Press). 


4. Rand Corporation (1955), (Free Press, Illinois) random num- 
ber tables consist of one million random digits consisting of 200,000 
random numbers of 5 digits each. 


5. Table of Random Numbers (The ISI series, Calcutta) by 
C.R. Rao, Mitra and Mathai. 


The first forty sets from Tippet’s Table have been reproduced 
below for illustrating their use in the selection of random samples. 


Table 15.1 
RANDOM NUMBERS 


2952 6641 3992 9792 7979 5911 3170 5624 
4167 9524 1545 1396 7203 5356 1300 2693 
2370 7483 3408 2762 3563 1089 6913 7691 
0560 5246 1112 6107 6008 8126 4233 8776 
2754 9143 1405 9025 7002 6111 8816 6446 


Example 15.1. Draw a random sample (without replacement) of 
15 students from a class of 450 students. 

Solution. First of all we identify the 450 students of the 
college with numbers from | to 450. Starting with the first num- 
ber in the above extract from Tippett’s random number tables and 
moving row-wise, we pick out one by one the three-digited numbers 
less than or equal to 450, till 15 numbers<450 are obtained. In 
this process the numbers over 450 are discarded and the repeated 
numbers, if any, are taken only once. 


The above numbers grouped in three’s аге: 
295, 266, 413, 992, 979, 279, 795, 911, 317, 056, 244, 
167, 952, 415, 451, 396, 720, 353, 561, 300, 269, 323, 
707, 483, 340. 
Thus the students corresponding to the numbers 
295, 266, 413, 279, 317, 56, 244, 167, 
415, 396, 353, 300, 269, 323, and 340, 
constitute the desired random sample of size 15. 
Example 15.2. Use Table 15 1, to draw a random sample 
without replacement of size 5 from population of 24 units. 


Solution. First of all we identify the 24 units in the popula- 
tion with numbers from 1 to 24. Then in Table 151, starting with 
the first number and moving row-wise we pick out the numbers in 
pairs, one by one, ignoring those numbers which are greater than 24 - 
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and counting the repeated numbers only once till a selection of 5 

umbers below 25 is made. These numbers are : 11, 24, 15, 13, 03. 

us the units in the population corresponding to these five 
numbers constitute the required random sample of size 5. 


Remark. In this method a large number of digits are rejected 
[as in Example 15-2 all digits above 24 are rejected], and thus we 
need large tables even to draw small samples. It may even happen 
that extract given from the table of random numbers is so small 
that we are not able to draw a random sample of the desired size. 
This difficulty can be overcome by asigning more than one number 
to each of the sampling units. For instance, in Example 152, the 
first unit may be assigned the numbers : 


1, 1+24, 1+2х 24, 143% 24, and so on 
ie., 1, 25 49, 73, 97, 121,...and so on, 
Similarly the 2nd unit may be assigned the numbers : 
2, 26, 50, 74, 98, 122,...and so on. 
Finally, the last unit may be assigned the numbers : 
0, 24, 48, 72, 96, 120,... 


. Following this procedure, the desired sample of size 5 will be 
given by the units corresponding to the numbers 4, 5, 15, 17, 18 as 


explained below : 
No. from Table 15:1 No. of the Sampled Unit 
5 
4 
18 
17 
15 


29—5--24 
52—4--2x24 
66=18+2x 24 
41—17-4-24 

39=15424 


15.10.3. Merits and Limitations of Simple Random Sampling 
Merits 1. Since it is a ili ing, it elimi: 
ў 5 Probability samplin ,it el tes 
wa дело, ns hne d шешеп ог асс of ae raved 
. c gly, the sample selected is i 
the population than in the case of judgment Bou e s 


2. Because of its тапдот 


the efficiency of the estimate ideri 

1 о з by considering the standard 
of their Sampling distributions. A i fee ee 
Remark 2 above], X as an esti Rn M кеше: [все 
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3. The theory of random sampling is highly developed so 
that it enables us to obtain the most reliable and maximum infor- 
pano at the least cost, and results in savings in time, money and 

our. 


Demerits 1. Simple random sampling requires an up-to-date 

бате, i e., a complete and up-to-date list of the population ünits to 

? be sampled. In practice, since this is not readily available in many 
enquiries, it restricts the use of this sampling design. 


2. In field surveys if the area of coverage is fairly large, 
then thé units selected in the random sample are expected to be 
scattered widely geographically and thus it may be quite time con- 
suming and costly to collect the requisite information or data. 


3. lf the sample is not sufficiently large, then it may not be 
representative of the population and thus may not reflect the 
true characteristics of the population. 

4. The numbering of the population units and the prepara- 
tion of the slips is quite time consuming and uneconomical parti- 
cularly if the population is large. Accordingly, this method can’t 
be used effectively to collect most of the data in socia] sciences. 

} 5 For given degree of accuracy, simple random sampling 
usually requires larger sample as compared to stratified sampling 
discussed below. [See § 15-11] 

6. Sometimes simple random sample gives results which are 
| highly improbabilistic in nature, i.e., whose probability is very small. 
| For example, a random selection of 13 cards from а pack of 52 
) cards might give all thirteen cards of spades, say. The prooabi- 
| lity of the happening of such an event in practice is very very small. 


15.11. Stratified Random Sampling 
When the population is heterogeneous with respect to the 
| variable or characteristic under study, then the technique of strati- 
fied random sampling is used to obtain more efficient results. 
Stratification means division into layers or groups. Stratified random 
sampling involves the following steps : 

1. Stratify the given population into a number of sub-groups 
or sub-populations known as strata such that : 

(a) The units within each stratum (sub-group) are as homo- 
geneous as possible. 

i (b) The differences between various strata are as marked as 

possible, i.e., the stratum means differ as widely as possible. 


(c). Various strata are non-overlapping. This means each and 
every unit in the population belongs to one and only one stratum. 


The criterion used forthe stratification of the universe into 
various strata is known as stratifying factor. In general geographical, 
sociological or economic characteristics form the basis of stratifica- 
tion of the given population. Some of the commonly used strati- 


MI 
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fying factors are age, sex, income, occupation, educational level, 
geographic area, economic status, etc. Stratification will be effective 
only if it possesses: the three characteristics (a), (b), (c) enumerated 
above. In many fields of highly skewed distributions, stratification is 
a very effective and valuable tool, 


Thus in stratified sampling the given population of size N is 
divided into, say, k relatively homogeneous strata of sizes Nis Nasse, Ne 
k 
Tespectively such that N= Ум 
i=] 

2. Draw simple random samples (without replacement) from 
each of the Kk strata. Let Srswor of size n: be drawn from the jth 
k 
Strata, (i=1, 2,..., k) such that Suan, where п is the total sam- 

i=] 
ple size from a Population of size N. 


k 
The sample of n= > 7: units is known as Stratified Random 
i=] 
Sample (without replacement) and the technique of drawing such a 
Sample is known as Stratified Random Sampling. 
i Remark. The basic Problems in stratified random sampling 


` (i) The Stratification of the universe into different strata or 
sub-groups, 


(ii) The determination of the sizes of the samples to be drawn 
from different strata. 
4 Both these points are equally important. A faulty stratificat- 
ton cannot be compensated even by taking large samples. 

15.111. Allocation of Sample Size in Stratified Sampling. 

` То obtain efficient results, the allocation of sample size ms, 

(—1,2,.., k) ie, the number of units to be Selected from the 
ith stratum, the total sample size п=п+п,4-...4-па being given 
is done in the following ways : 


(i) Proportional Allocation 
(ii) Disproportionate Allocation. 


The allocation of sample sizes is termed as proportional if the 
sample fraction, i.e., the ratio of the sample size to the Population 
size, remains the вате in all the Strata. Mathematically, the principle 
of proportional allocation gives : 


аа me -. (15.13) 
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с һм), m NS ).—. m= ( 2) 1544). 


Disproportionate Allocation. In this case an equal number of 
items taken from every stratum regardless of how the stratum is 
represented in the population. Sometimes the proportion may vary 
from stratum to stratum also. In short, a stratified sample in which 
the number of items selected from each stratum is independent of its 
size is called disproportionate stratified sample. 


15.11.2. Merits and Demerits of Stratified Random Sampling 


Merits 1. More Representative Sample. A properly construct- 
ed and executed stratified random sampling plan overcomes the 
drawbacks of purposive sampling and random sampling and still 
enjoys the virtues of both these methods by dividing the given 
universe into a number of homogeneous subgroups with respect to 
purposive characteristic and then using the technique of random 
sampling in drawing samples from each stratum. A stratified 
random sample gives adequate representation to each strata or 
important section of the population and eliminates the possibility of 
any important group of the population being completely ignored, 
The stratified random sampling provides a more representative 
sample of the population and accordingly results in less variability 
as compared with other sampling designs. 


2. Greater Precision. As a consequence of the reduction 
in the variability within each stratum stratified random sampling 
provides more efficient estimates as compared with simple random 
sampling. For instance, the sample estimate of the population 
mean is more efficient in both proportional and Neyman's allocation 
of the samples to different strata in stratified random sampling as 
compared with the corresponding estimate obtained in simple 
random sampling. 


3. Administrative Convenience. The division of the popu- 
lation into relatively homogeneous subgroups brings administrative 
convenience, Unlike random samples, the stratified samples are 
expected to be localised geographically. This ultimately results in 
reduction in cost and saving in time in terms of collection of the 
data, interviewing the respondents and supervision of the field 
work. 


4. Sometimes it is desired to achieve different degrees of 
accuracy for different segments of the population. Stratified random 
sampling is the only sampling plan which enables us to obtain the 
results of known precision for each of the stratum. 


5. Quite often, the sampling problems differ quite significantly 
in different segments of the population In such a situation, the 
problem can be tackled effectively through stratified sampling by 
regarding each segment of the population asa different strata and 
approaching upon them independently during sampling. 
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Demerits 1. As already pointed out the success of stratified 
random sampling depends оп: 


(i) Effective stratification of the universe into homogeneous 
— strata and 


(ii) Appropriate size of the samples to be drawn from each of 
the stratum. 


If stratification is faulty, the results will be biased. The error 
due to wrong stratification cannot be compensated even by taking 
large samples. 


The allocation of the sample sizes to different strata requires 
an accurate knowledge of the population size in each stratum 
№, i=1, 2,...,k. [c. f. Proportional Allocation т ос №]. Further 
Neyman’s principle of optimum allocation, viz., m ос NiSi, requires 
an additional knowledse of the variability or standard deviation of 
each strata. Ni and Si, (i=1, 2,..., К) are usually unknown and are 
à serious limitation to the effective use of stratiaed random 
sampling. 


2. Disproportional stratified sampling requires the assignment 
of weights to different strata and if the weights assigned are faulty, 


the шша sample will not be representative and might give biased 
results. . 


_ 15.12. Systematic Sampling. Systematic sampling is slight 
variation of the simple random sampling in whieh only the first 
sample unit is selected at random and the remaining units are auto- 
matically selected in a definite sequence at equalspacing from one 
another. This technique of drawing samples is usually recommend- 
ed if the complete and up-to-date list of the sampling units, i.e,, the 
frame is available and the units are arranged in some systematic 
order such as alphabetical, chronological, geographical order, etc. 
This requires the sampling units in the population to be ordered in 
Such a way that each item in the population is uniquely identified by 


its order, for example the names of persons in a telephone directory, 
the list of voters, etc. 


Let us suppose that N sampling units in tho population are 
arranged in some systematic order and serially numbered from 1 to 
N and we want to draw a sample of size n from it such that 


N=nk > ke. (15-15) 
where К is usually called the sample interval. Che 


Systematic sampling consists in selecting any unit at random 
from the first k units numbered from 1 to k and then selecting every 
kth unit in succession subsequently. Thus, if the first unit selected 
at random is ith unit, then the systematic sample of size n will 
consist of the units numbered 


i, i+k, it+-2k,...,i+(n—1)k. 
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The random number ‘i’ is called the random start and its value, 
in fact, determines the whole sample. As an example, let us suppose 
that we want to select 50 voters from a list of voters containing 1,000 
names arranged systematically. Here 


п=:50 and N=1,000. 


1,000 
50 


We select any number from | to 20 at random and the corres. 
ponding voter in the list is selected. Suppose the selected number 
is 6. Then, the systematic sample will consist of 50 voters in the 
list at seria] numbers : 6, 26, 46, 66,..., 966, 986. 


Y k=—= =20 
n 


] Obviously we can select k possible systematic samples starting 
n the Ist unit, 2nd unit,.... Ath unit which are enumerated 
low: 


Sample Composition 
{Units in the Sample) 


t mu LEJE 
2--jk, 


1+(n—1)k 
24-(n—1)k 


it (— Dk 


i+jk, 


КОЕ eee. ESSO uds 


Thus К rows of the table give the k-systematic samples. The 
columns of the above table are also sometimes referred to as n 
strata. 

Remark. Systematic random sample appears like a stratified 
random sample with one unit per stratum. 

15.12.1. Merits and Demerits 

Merits 1. Systematic sampling is very easy to operate and 
checking can also be done quickly. Accordingly, it results in con- 
Siderable saving in time and labour relative to simple random sam- 
pling or stratified random sampling. ? j 

2. Systematic sampling may be more efficient than simple 
random sampling provided the frame is complete and up-to-date 
and the units are arranged serially in a random order like the names 
in a telephone directory where the units are arranged in alphabetical 
order. However, even in alphabetical arrangement, certain amount 
of non-random character may persist. 

Demerits 1. Systematic sampling works well only if the com- 
plete and up-to-date frame is available and if the units are randomly 
arranged. However, these requirements are not generally fulfilled. 
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2. Systematic sampling gives biased results if there аге perio- 

dic features in the frame and the sampling interval (k) is equal to 
or a multiple of the period. 


The relative efficiency of the systematic sampling over strati- 
fied random sampling or simple random sampling without’ replace- 
ment (srswor) largely depends on the properties of the population 
under study. Without a knowledge of the structure of the population 
no hard and fast rules can be laid down and no situations can be 
pinpointed where the use of systematic sampling is to be recom- 
mended. 


1543. Cluster Sampling. In this case the total population is 
divided, depending on problem under study, into some recognisable 
sub-divisions which are termed as clusters and a simple random sam- 
ple of these clusters is drawn. We then observe, measure and inter- 
view each and every unit in the selected clusters. 


50р example, if ws are interested in obtaining the income ог 
Opinion data ina city, the whole city may be divided into N diffe- 
rent blocks or localities (which determine the clusters) and a simple 


random sample of n blocks is drawn. The individuals in the select- 
ed blocks determine the cluster sample. 


in ae e cluster sampling the following points should be borne 


(i) Clusters. should be as small as possible consistent with the 
cost and limitations of the survey, and 


(ii) The number of sampling units in each cluster should be 
approximately same. 


. Thus cluster sampling is not to be recommended if we are sam- 
Pling areas in city where there are private residential houses, busi- 
ness and industrial complexes, apartment buildings, etc., with widely 
varying number of persons or households. 


15-14. Multistage Sampling. Instead of enumerating all 
the. sampling units in the sclected clusters one can obtain better 
and more efficient estimators by resorting to subsampling within 
the clusters. The technique is called two-stage sampling, clusters 


being termed as primary units and the units within the clusters as 
secondary units. 


The above technique may be generalised to what is called 
multistage sampling. As the name Suggests, multistage sampling 
refers to a sampling technique which is carried out in various stages- 
Here the population is regarded as made of a number of primary 
units each of which is further composed of a number- of secondary 
stage units and so on, till we ultimately reach the desired sampling 
unit in which we are interested. For example, if we are interested 
in obtaining a sample of, say, л households from a Particular State 
the first stage units may be districts, the second Stage units may be 
villages in the districts and third stage units will be households in 
the villages. Each stage thus results in areduction of the sample 
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Multistage sampling consists in sampling first stage units by 
some suitable method of sampling. From among the selected first 
stage units, a sub-sample of secondary stage units is drawn by: some 
suitable method of sampling which may be same as or different from 
the method used in selecting first stage units. Further stages may be 
added to arrive at a sample of the desired sampling units. 

Merits. Multistage sampling is more flexible as compared to 
other methods of sampling. It is simple to carry out and. results in 
administrative convenience by permitting the field work to be con- 
centrated and yet covering large area. 

Most. practical advantage of multistage sampling is that we 

need the second stage frame only for those units which are selected 
in the first stage sample and this leads to great saving in operational 
созі. Consequently this technique is of great utility, particularly 
in surveys. of under-developed area or pockets where no up-to-date 
and accurate frame is generally available for subdivision of the ma- 
terial into reasonably small sampling units. 
Е Demerits. Errorsare likely to be larger in this method than 
in any other method. The variability of the estimates under this 
method may be greater than that of estimates based on simple 
random sampling. This variability depends on the composition of 
the primary units. In general, a multistage sampling is usually less 
efficient than a suitable single stage sampling of the same size. 

15:16. Quota Sampling. Quota sampling may be looked 
as a special form of stratified sampling. In this method, the investi- 
gator is told in advance the number of the sample units he is to 
examine or enumerate from the stratum assigned to him. in the 
language of stratified sampling, the quota of the units to be examined 
by the investigator from the stratum assigned to him is fixed for 
each investigator. The sampling quotas may be fixed according to 
Some specified characteristic such as income group, sex, occupation, 
Political or religious affiliations, etc. The choice of the particular 
units or individuals for investigation is left to the investigators 
themselves. They are merely given the quotas with the specific 
instruction to inspect (interview) a specified number of units (in- 
formants) from each stratum. Quite often the investigator does not 
make a random selection of the sample units. He usually applies 
his judgment and discretion in the choice of the sample and tries 
to get the desired information as quickly as possible. Moreover, 
in case of non-response from some of the selected sample units (due 
to certain reasons like non-availability of the respondent even after 
repeated calls by the investigator, or the inability or refusal of the 
informant to furnish the requisite information), the. investigator 
selects some fresh units himself to complete his quota. In doing so, 
he is likely to include some purposive units to get the desired infor- 
mation. 


Merits 1. Quota sampling is a stratified-cum-purposive or 
judgment sampling and thus enjoys the benefits of both. It aims at 
making the best use of stratification without incurring high costs 
involved in following any probabilistic method of sampling. There 
is considerable saving in time and money as the sampled units may - 
be so selected that they are close together. 
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2. If carefully executed by skilled and experienced investiga- 
tors who are aware of the limitations of judgment sampling and if 
Proper controls or checks are imposed on the investigators, quota 
sampling is likely to give quite reliable results. 


Demerits. Since quota sampling is a restricted type of 
judgment sampling, it suffers from allthe limitations of judgment 
or purposive sampling, viz., 


(i) It may be biased because of the personal beliefs and pre- 
judices of the investigator in the selection of the units or/and ins- 
pecting them. 


(ii) It may involve the bias due to the substitution of the 
sampled units from where there is no response. 


(iii) Since it is not based on random sampling, the sampling 
error cannot be estimated. 


Та spite of all these shortcomings, the technique of quota 
sampling is generally adopted in market surveys, political surveys, 
or surveys of opinion poll where it is very difficult, rather impossi- 
ble, to identify the strata in advance. 

EXERCISE 151 

l. Distinguish between a population and a sample. Discuss the relative 
merits of census and sample methods of collécting data. 

2. Distinguish between census and sample methods. Compare their 
relative merits and demerits. 

[Mysore Uni. B. Com., Nov. 1981 ; С.А. (Intermediate), Nov., 1976] 


3. Explain briefly (your answer should not exceed about 300 words) 
why a sample survey is usually preferred to a census survey. Give one example 
of a situátion where a census survey is imperative. 


[C.A. (Intermediate), Nov. 1974] 


X 4. Describe briefly the Law of Statistical Regularity and state its 
applications in the economic and social spheres. 


(Nagarjuna Uni. В. Com., April 1980) 
5. (а) What are the different sources of errors in a sample survey ? 
(b) Describe briefly how these can be controlled. 
6. Distinguish clear] i i 
jM {тоб to say the, ly between sampling and non-sampling errors. Is 


n-sampling e; ise | 2 
will you control these chore 15 trors do not arise in a sample survey? How 


7. What is a statistical error? Explain the di 
айша error and a ‘mistake’. Describe the various 9 аен таа 
018. 
(Allahabad Uni. В. Сот., 1979) 
8. What are “statistical errors” 


Explain the methods of measuring them, ? What are the sources of кй? 

(Mysore Uni, B. Com., Nov. 1981) 
9. Define: 

(1) Statistical Errors. [C.4. (Intermediate), May 1976) 

(ii) Biased and Unbiased Errors. ІС.А. (Intermediate), Nov. 1975] 

(iii) Sampling and Non-sampling Errors, 

(іу) Absolute and Relative Errors. 
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10. (a) Discuss the technique of Judgment or Purposive Sampling. 

(b) What is Simple Random Sampling ? Discuss its relative merits and 
demerits. 

(c) Describe the various methods of drawing a random sampling from a 
finite population. 

11. Distinguish between simple random sampling and purposive 
sampling. 

Describe a procedure for drawing a random sample of size 5 from a 


population of size 17 (with replacement method). 
L.C. W.A. (Final), June 1980] 


12. Whatarerandom sampling numbers? Outline the different 
sendon number series and explain how these are used to select a simple random 
sample. т 


13. A carefully designed ‘Sample’’ is said to be better than a poorly 
Planned and executed **Census". Bring out the merits of sample method of 
enquiry and at least three of the methods to obtain representative data in a 


sample. 
(Punjab Uni. B. Com., 1978) 
14. Explain briefly the reasons for the increasing popularity of sampling 
methods. Explain briefly any two methods of sampling which help us to obtain 
a representative sample. 
[С.А. (Intermediate), Мау 1979] 


Е 15. Distinguish between simple random sampling and stratified random 
sampling. 


Describe a procedure of drawing a random sample of size 3 from a 
population of size 11 by ‘without replacement’ method. 
U.C.W.A. (Final), Dec. 1979] 


16. Bring out the important features of 
(i) Systematic Sampling. 


(ii) Stratified Sampling. 
(Delhi Uni. B. Com., 1982) 


17. “Mere size, of course, does not assure representativeness „їп a 
sample. A. small random or stratified sample is apt to be much superior to 
a large but badly selected sample.” 


Discuss this statement pointing out the advantages, disadvantages and 
limitations of the sample method. 
К p (Punjab Uni. B. Com., Sept. 1977) 


18. Distinguish between random sampling and stratified sampling. 
Suppose it is desired to survey petrol buying habits of car ownersina particular 
city, how would you proceed about it? Draw a brief questionnaire for the 
Воже. (Punjab Uni. B. Com., 1979) 


19. Write a note on sampling and its uses. 
= 2 IC. 4. (Intermediate), May 1981 ; May 1982) 


ibe briefly any three methods of sampling. 
(0) DE [C.A. (Intermediate), Nov. 1981] 


20. Three sampling plans to determine the quality of the manufactured 
product are given below : 
(i) Inspect every 10th item. 
(ii) Inspect one item every 10 minutes. 
(iii) Inspect a random sample of 6 during each hour's production. 
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State the sampling design in each case, Which -one is the most 
appropriate ? Give reasons. 


(Bombay Uni. B. Com., November 1980) 
1 Ans. (i) гапа (ii) : Systematic random sampling; Gil) Simple random 


$ 21. Describe the following sampling Plans, and give their relative merits 
-and demerits, 


(i) Judgment or Purposive Sampling. j 
(ii) Simple Random Sampling 
(iii) Stratified Random Sampling. 
(iv) Cluster Sampling. 
(у) Multistage Sampling. 
(vi) Quota Sampling. 
(vii) “Systematic Sampling. 


16 


Interpolation and Extrapolation 


161. Introduction. Let us suppose that we are given two 
variables z and y, x being the independent variable and y the depen- 
dent variable. We say that y isa function of хапа we write it 
as: 


y=f (2) ys, (say). (1611) 


Suppose we are given the values xo, Xj, Xs,--*, Xn of x and let 
the corresponding values of y:be yo, уу, }%,..., Yn respectively. If 
we want to estimate the value of yz for any value. of x between the 
limits, xo and æn, this can be done by applying the technique of 
Interpolation. For example, suppose we are given the population 
census figures (yz) for the years (x) 1931, 1941, 1951, 1961 and 1971 
and we want to estimate the population for any year between 1931 
and 1971, say, 1958, 1965, etc. This is done by the method of inter- 
polation. However, if we have to estimate the population for the 
period outside the range 1931—1971, say, for 1926 or 1975, the 
technique is known as extrapolation. Interpolation is defined as 
the technique of estimating the value of ух for any intermediate value 
of the variable x. 


Theile defines interpolation as ‘йе art of reading between: the 
lines of a table." 


Interpolation or extrapolation is the technique of obtaining the 
most likely estimate of a certain quantity (dependent variable) from 
the given relevant facts, under certain assumptions. 


Remarks 1. It should be clearly understood that there is no 
difference between interpolation and extrapolation as far аз estima- 
tion methods and underlying assumptions are concerned. The only 
difference between the two is that interpolation relates to estimation 
of a value- within the given range of the series while extrapolation 
deals with obtaining the forecast or projections (in the: past or 
future) beyond the given range of the series. 
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2. The values of the independent variable x are known as 
arguments and the corresponding values of the dependent variable 
are known as entries. 


16°11. Assumptions. The techniques of interpolation and 
extrapolation are based on the following assumptions : 


(i) There are no sudden jumps or falls in the values of the 
dependent variable (entries) for the periods under consideration. In 
other words, the values should relate to periods of normal and 
stable economic conditions, i.e., the given data should be free from 
all sorts of abnormalities and all sorts of random and irregular 
fluctuations like earthquakes, wars, floods, epidemics, labour 
strikes, lock outs, economic boom and depression, and political 
disturbances, etc., which may result in violent ups and downs in the 
values of ys. This means that the data can be represented by a 
smooth and continuous curve. Mathematically, it means that the 
given data can be represented by a polynomial of certain degree 
Bick is determined by the following fundamental theorem in 
algebra : 


“One and only one polynomial curve of degree less than or equal 
to n passes through a given set of (n+ 1) distinct points,” 


Thus, if we are given a set of 4 entries (values of y) then yz 
can be represented as а polynomial of 3rd degree, viZ., Ye=Ay+a\x 
7Fagx* - a4x*. Similarly if we are given only 3 entries, then у. can be 
арене as a second degree polynomial in x, viz., ye=by-+b,x-+b,x? 
and so on. 


All the formulae of interpolation are based on the fundamental 
assumption that the given data can be expressed as a polynomial 
function (of certain degree) with fair degree of accuracy. 


(ii) In the absence of the evidence to the contrary there is 
regularity in the fluctuations so that the rate of change іп the given 
data has been uniform. Thus, in the example of census population 
given above, it is assumed that the rate of growth has been con- 
sistent (uniform) from the years 1931 to 1971. 


Remarks 1. In order to arrive at valid estimates, a fairly 
£ood number of arguments and entries should be given. 


2. Ifa number of consecutive missing values are to be esti- 
mated from the given data, then the estimates are unlikely to be 
reliable. 


1612. Uses of Interpolation. 1. The need for interpolat- 
ing missing observations or making forecasts or projections ariscs 
in a number of disciplines like economics, business, social sciences, 
actuarial work, population studies, etc. Some of the examples are 
ascertaining the most likely prices, business changes, mid-term 
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intercensal population figures, mid-term figures of industrial pro- 
duction from their totals and so on. 


2. The interpolation technique has been used to derive the 
formulae for the computation of median and mode in case of conti- 
nuous frequency distribution. [See formulae in Chapter 5]. 


. 3. Interpolation techniques аге used to fill in the gaps in the 
statistical data for the sake of continuity of information. These gaps 
in the data may be due to the following reasons : 


(i) Due to certain financial and organisational difficulties, 
data may not be collected on census basis and sampling techniques 
may be used to obtain the relevant information. The intermediate 
gaps are then filled by interpolation methods. 


(ii) Data for some periods may not be collected due to certain 
unavoidable circumstances. 


(iii) Figures of some of the periods may be erased, destroyed 
or lost due to certain reasons like improper handling or random and 
natural causes like fire, floods, etc. 


Interpolation techniques help us to obtain the best (most 
likely) substitutes for the original missing values under certain 
assumptions discussed in $16°`1°1 and these methods are entirely 
different from those by which actual data are obtained. 


Remark. Accuracy of Estimates. Since the interpolation 
techniques are based on certain assumptions which may not hold 
good in practice, the estimates so obtained, may not always be 
accurate or reliable. ltis not possible to ascertain the error of 
estimate. The accuracy of the interpolated value depends on : 


(i) A knowledge of possible fluctuations in the values of the 
phenomenon under study, which is provided by the available data 
at our disposal. 

(ii) A knowledge about the course of events which affect the 
value of the phenomenon under investigation. If we know that the 
estimated value of the given phenomenon at a particular period is 
affected by random factors like political riots, floods, etc., then the 
interpolated value should be modified in the light of this information 
and a better estimate may be obtained. 

162, Methods of Interpolation. The methods of interpola- 
tion or extrapolation may be broadly classified as follows : 


(i) Graphic Method. 
(ii) Algebraic Method. 


‚ We shall discuss these methods in details in the following 
sections. 
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16'3. Graphic Method. This method consistsin represent- 
ing the given data geometrically by means of a graph. The indepen- 
dent variable is plotted along the X-axis and:the dependent variable 
is plotted along the Y-axis. The various points so obtained are 
joined together by a smooth free hand curve. From this curve, 
which will represent the general trend of the relationship between 
the two variables, we can read the value of one variable correspond- 
ing to any given value of the other variable within the given range 
of the series. [For Graphs of time series— See $4'44, 


If the value of y (or x) lies outside the given range, i.e., if it is 
a case of extrapolation, then the smoothed curve is extended to 
the енге point and then the estimated value is read from the 
graph. 

For example, if we want to find the value of y for x—a, then 
at x—a, draw a perpendicular to z-axis, meeting the smoothed 
curve at P. From point P draw a line РО parallel to X-axis meeting 
the Y-axis at Q. Then the estimated value of yz at x=a is OQ. 


Graphic method is specially useful in the following situations : 
(i) Series is correlated (either positively or negatively). 


. Gi) In case of historical or temporal series which exhibit 
periodical or cyclical fluctuations. 


. .Merits and Demerits. Obviously, graphic method is a very 
Simple and quick method of studying the relationship between two 
variables. It is also very flexible as it can be used to study all types 
of trends— linear as well as non-linear. 


The strongest drawback of this method is its subjecti 
t ective nature 
due to the inherent bias of the investigator. | а 


Example 161. The following table gives the profit ofa firm 
for the period 1971 to 1976. The figure for 1975 is missi; Ў z 
polate the same by graphic method : a oo 


Year 1971 1972 1973 1974 1975 1976 
Profits 110 120 115 125 2. 130 
(Rs. in lakhs) (Osmania Uni. В. Com., April 1978) 


Solution. We have to interpolate the missin figure (profits 
in lakhs of rupees) for the year 1975 by acaphie enis We 
plot the given data on the graph, taking the independent variable 
yeats/(x) along X-axis and the dependent variable profits (in lakhs 


re RP eS 
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of Rs.) along Y-axis. At x=1975, drawia perpendicular to X-axis 
meeting the curve in point P. From P draw PM parallel to X-axis 
meeting Y-axis іп М. Then estimated profits (in lakhs of Rs.) for 
1975 are OM—12T'5. 
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PROFITS (1n lakhs of Rs) 
A Se NE 


0 1971 1972 1973 1974 1975 1976 
YEARS 


Fig. 161 


Remark. Theanswer does not seem to be accurate in this 
case as the data exhibits rise and fall trend alternate years. The 
estimated value is overstated in this case whereas actually it should 
have been less than Rs. 125 lakhs. 


16:4. Algebraic Methods. A number of algebraic methods, 
based on the fundamental assumptions discussed in $20']'l, have 
been developed for interpolation or extrapolation of figures. Some 
of the commonly used methods are enumerated below and will be 
discussed in the following sections : 


(i) Method of Parabolic Fitting. 


(ii) Methods of Finite Differences. (Newton’s forward differ- 
ence and Newton’s backward difference formulae.) 


(iii) The Binomial Expansion Method. 


(iv) ‘Divided Differences Method’ and ‘Lagrange’s Method" 
for unequal interval of arguments, 


165, Method of Parabolic Curve Fitting. The form of func- 
tion y=f(x) or its estimate for any given value of є can be obtained 
by fitting a polynomial curve to the given set of observations pro- 
vided the values of x (arguments) are at equal intervals. The method 
is based on the fundamental theorem of algebra, viz., one and only 
one polynomial curve of degree less than or equal ton passes through 
a given set of (n+1) distinct points’. Thus if we are given (n+1) 
equidistant arguments and entries then we can represent the function 
у=] (x) bya polynomial of nth degree, viz., 


у= (х) =аух" ах" i ax.. banyxtan +» (16°2) 
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where dos a;,---, ап are (n+1) constants whose values are to be 
determined from the (n+1) equations obtained on substituting the 
given values of x and y=f(x) in (*). Solving the (n+1) equations 
so obtained and substituting the values of ао, йу,..., An in (*) we 
get the required form of function y=f(x), which can then be used to 
estimate y for any given value of x. We shall explain the technique 
by means of an example. 


Remark. The method of parabolic curve fitting is quite time 
consuming and tedious, particularly when the number of entries 
given is large. For example, if we are given 5 entries, then y—/(x) 
can be represented by a polynomial of 4th degree which involves 5 
unknown constants. To determine these 5 constants we have to solve 
simultaneous equations in these 5 unknowns which is quite difficult 
job. Moreover, this method is applicable only if the arguments 
are at equal intervals and cannot be applied for unequal intervals. 


Example 162. Find f(x), given that f(0)— —3, f(1)— 6, f(2)—8, 
£(3)=12. State your assumptions, if any. Hence find Аб). 
U.C.W. A. (Intermediate), June 19761 


Solution. Since we are given 4 entries, we can regard f(x) to 
be a polynomial of 3rd degree, say, 


Ла) аха сха (0) 
This involves 4 unknown constants a, b, c and d. 
Putting x=0, 1, 2, and 3 in (i) we get respectively : 
f(0)—4——3 
f(1) =a+b+ce+d=6 
f(2)—8a--4b--2c--d—8 
J(3)—27a4-9b4-3c--d—12 


- at+b+c=6—d= En 

8a+4b+2c=8—d=11 (iv) 

27a+9b+3c=12—d=15 (v) 
(iv)—2x (iii) gives 


8a+4b+2c=1í 
2a+2b-+2c=18 


mama ЗЕЕ 
6a+2b =—7 

2X (y) —3 x (iv) gives : 
54a+-18b+6c=30 
24a+12b+6c=33 


« (yi) 


30a+6b =—3 ii) 
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Multiplying (vi) by 5 and subtracting from (vii) we get : 


30a+ 6b— —3 
30a+ 10b=—35 

=< ar + 
—4b = 32 
mJ! 


Substituting in (vi) we get : 


е бач лү a= at 


Substituting the values of.a and b in (iii) we have 


3 31 
с=9—а-ь=9— 5 +8=5- 
Finally, substituting the values of a, b,c andd in (i) we get 
the form of function f(x) as : 


ЖЗ eie gaps iii 
fox) 2* 8x2+ 2* 3 +» (viii) 
Putting x=6 in (viii), we get : 


A9-3x 6:—8x 64- E X6—3=324—288+-93—3=126 


16:6. Method of Finite Differences. The calculus of finite 
differences is a very convenient tool of interpolating figures when 
the arguments are at equal intervals. Before we develop the different 
techniques we shall first define the operators A and E, used exten- 
sively in the theory of finite differences. 


Let us suppose that the equidistant values of the independent 
variable x аге: 
a, a- h, a+2h,.--,a+nh ; 


where a is known as the initial argument (or origin), h is known as 
the common interval of differencing. Letthe corresponding values 
of the independent variable y=f(x), be 


F(a), f(a--h), f(a4-2),..., f(a--nh) ; 
which are known as the entries. For example f(a+h) is the entry 
corresponding to the argument a--/i, and so on. 

Operator A. The difference operator A (Capital Delta of 
Greek alphabet) is defined as : 


AfG)-f(x--h)—f(z) ; x=a, ath, a+2h,... ^ ...(163) 


ч. 
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In particular, taking x—a, a+h,..-in (20°3) we get : 

Afla)  c-f(ía-h)—f(a) 

A flath) —f(a--h4-h) —f(a4-h) —f(a--2h) —f(a-d-h) 

A f(a-4-2h) —f(a--3h) —f(a4-2h), (164) 
and so on. 


The differences defined in (1 6'4) are known as first order diffe- 
rences. By performing the operator A on the first order differences 


in (164) we get the second order differences, which are denoted by 
A*. In particular 


А? fa) = ALAf(a)]-- ALf(a4-h) —f(a)] 


= Af(a--h) — Af(a) (165) 
=[/ (a4-2h) —f (a4-5)] —Lf(a4-h) —f(a)] 
—f (a--2h) — 2 f(a-I-h)--f(a), (165a) 


and so on. 


Similarly proceeding we can obtain third and higher order 
differences denoted by A?, A$,..., and so оп. These differences of 
various orders can be conveniently expressed in the form of a table 
known as Finite (Forward) Difference Table. The following table 


gives the arguments, entries and the first two differences. The higher 
order differences can be similarly obtained. 


Table 16:1 
FORWARD DIFFERENCE TABLE 
Argument Entry First differences Second differences 
Е y=f(x) A f(x) A* fix) 
E D E ads T ee 
a Ха) 
f(a--h)—fía) 
—Af(a) 
ath fia+h) A f(a--h) — ^ fia) 
fa--2)fta--h) EI 
a (а 
=Afa+h) 
a+2h Ла+2һ) А f(a-F2h) — ^ f(a--k) 
— A*f(a-4-h) 
f(a--3h)— f(a--2h) 
=A f(a--2h) 
a+3h f(a-4-3h) A f(a--3h)— ^ fla+2h) 
: fta--4h) — f(a4-3h) алан 
а: —](a 
=Afia+3h) 
а+4ћ f(a--4h) 


oT 
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f(a) is known as the first entry in the difference table and 
Afla), Afla), Afla), are known as the leading differences. 
This table is also sometimes known as the diagonal (forward) 
differences table since the differences are shown in diagonal 
pattern. 

Operator V. The backward difference operator denoted b; 
V (Nebla of Greek alphabet) is defined as : Р Рана 

V Sxthy=f(x+h) Дх) = Afe) - (1676) 


In other words, the backward difference of f(x+h) is same as 
forward difference of f(x). The following table gives the arguments, 
entries and the backward differences up to 2nd order. 


Table 16:2 
BACKWARD DIFFERENCE TABLE 
ес 1 ———ї——== 
Argument Entry First differences Second differences 
у=Йх) УДх) V(x) 
es 
a Ха) 


Ла+т —f(a) 
— Vf(a--h) 
ath fla+h) Vf(a4-2h)— V f(a--h) 
= V*f(a-4- 2h) 
J(a-4-24)—f(a-- h) 
=VS(at2h) 
a+2h f(a+2h) V/(a--3h)— V f(a--2h) 
=V*fla+3h) 
f(a+-3h)—f(a+2h) 
=VF(a+3h) 
a+3h f(a+3h) УЛа+4һ)— Vfla+3h) 
— V"f(a--4h) 
f(a-- 4h) — f(a-t-3h) 
= Vf(a--4h) 


a+4h S(at+4h) 


Like the Table 161, this table is also known as the diagonal 
backward difference table since the differences are shown in the 


diagonal pattern. 
Remark. Unless stated otherwise, interval of differencing will 
always be taken as unity (one). 
Operator E. In case of the arguments at equal intervals ‘A’, 
the operator E is defined as : 
Ef(x)=f(x+h) (167) 
i.e., the operator E is equivalent to increasing the argument by the 
interval of differencing. 
Like second and higher order differences, the operator Е? 
means the operator E is performed twice. Thus 
Ез) = ELEfG)] - EU(x-- A) fcr 28) (168) 
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In general, 


Etf(x)=f(x+rh) (16:9) 
where h is the interval of differencing. Taking h=1 in (20°9) we get 
Etf(x)=f(x+r) (169a) 


In particular 


E f(0) - E1f(0) — /(0--1) —f(1)) j 
aA ear tne } (16°95) 


Led ler) Salad 


and so on ; provided interval of differencing is unity. 


Similarly 
Ef(D) - Ef(D—f-4- D=), } (1690) 
ЕЎ/(1)=/(1-+2)=Д3) 


These results are of much practical utility and should be com- 
mitted to memory. 


Relation Between Operators E and A. We have, by defini- 
tion : 
Ajf(x)- Ло) — fix) = Ех) — f (9) 
= Af(x)= (Е—1\/(х) 


Since f(x) is arbitrary, we get the relation between the opera- 
tors A and E as : 


A=E-1 > 1+A=E (16:10) 


Fundamental Theorem of Finite Differences. If f(x) is a poly- 
nomial of thedegree in x, i.c., if 


f(a) —ayx"4-ax*71 4- a,x"73 4-... +an1x+an 
then Af (х) = as(n!) - constant } 
and Atf(x)=0, if rn 


In other words, the п? order difference of a polynomial of п“ 
degree is constant.and higher order differences are all zero. 


With this background, we are now set to discuss the Newton’s 
forward and backward difference formulae and also to discuss the 
binomial expansion method of estimating the missing valucs for 
arguments at equal intervals. 


Remark. It should be understood that the operators E, A 
and V are defined only when the arguments are given to be at 
equal intervals. 


167. Newton's Forward Difference Formula. This formula 
enables us to determine the polynomial form of the function f(x) 


and hence estimate its value for any given value of x. It is given 
by the formula : 


(1611) 


— 
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ДӘ=Да)+илуа)+-“®—) дауа) 


—1)\(u— 
ЕП Arig) +. : --(16:12) 


where x is the period of interpolation, i.e., it is the value corres- 
ponding to which entry is required ; a is first argument in the: 
difference table and 


ux Period of interpolation— period of origin 
Interval of differencing 


- w= 4 
h 


The last term in (16°12) depends on the number of entries 
given. If we are given (n+1) entries so that f(x) can be expressed 
as a polynomial of nth degree then, by fundamental theorem of 
finite differences A"f(x) will be constant and higher differences will. 
be zero. In that case (16°12) will contain terms up to A"f(x) and we 
shall get f(x), on simplification [of (16*12)] asa polynomial of nth 
degree in x. 


(1613) 


Proof. Proof of (16°12) is very simple and depends on the 
definition of operator E and the binomíal expansion. We have 
from (16:13), 


eas? > x=a+hu 
S(x)=f(a+hu) 
—E"f(a) [c.f (20°9)] 
=(1+A)¥(@) [c.f. (20"10)] 


=(14"C,A +C,A?+"C,A$+-..-)f(a) 
—f(a) - "C Af (a) +С, A?f (a) -"C, A*f(a) - --- 


—f(a)--uAf(a) 4-70. arpa) 
4 36102. A (ai. 


as desired. 


Remark. Formula (16°12) is also commonly known as 
Newton-Gregory formula for forward interpolation. This is so called 
because it contains the values of the function f(z) from f(a) onwards 
to the right and none to the left of f(a). It is recommended for esti- 
mating the value of y —f (x) near the beginning (х> а) of a set of given 
values, because this formula is based on the leading differences 
which always occur in the beginning of the difference table. 
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Example 16:3. If y=2x*—x?+3x+ І, calculate the value of y 
corresponding to x—0, 1, 2, 3, 4, 5 and form the table of differen- 
сез. 


Solution. Here we have 


ya=2x3— x+ 3x41 ...(%) 
Putting х=0, 1, 2, 3, 4 and 5 in (*) we get respectively : 
Yor 


у1=2х1—1+3+1=5 
у=2х8—4+3х2+1=19 
Ys=2X27—9+3X3+1 
=54—9+9+1=55 
Wy=2X 4-443 X 44-1=2X 64—164+12+1=125 
ys=2X 53— 524-3 x 5+ 1=250 —254-154-1—241 
The difference table is given below : 
^ TABLE OF DIFFERENCES 


Example 16:4. Estimate the expectation of life at the age of 
16 years by using the following data. 


Age (in years) : 10 15 20/225 30 35 
Expectation of 
life (years) 1 35M 1594:30020:2:1226/0; 232 204 
U.C.W.A. (Intermediate), Dec. 1977 ; 
Guru Nanak Dev Uni. B.Com., II, 1983] 


Solution. Since the year of interpolation is in the beginning of 
the table we use Newton’s forward difference formula as given 
below : 


payaru yot o D yep STD psy... 
NI 


Interpolation and Extrapolation 837 


where u—(z—a)/h; and x is the year of interpolation, а із the 
year of origin and / is common interval of differencing. 
16—10 6 
= =т= ex 
5 5 12 Hb) 
The difference table is given below. 


TABLE OF DIFFERENCES 


Hence, from (*) and (**) we get : 


уң=354-+Е1°2(—3'1)+ 
ааа шас 


(120-21) 
аиа 


x(—0'1) 
+ (0202101220123) x (0°6) 
тоо 2X02-9028 х(—1°5) 


- у16=35'4—3°'724-04+"00324-'00864--*012096 
—3177 years. 
Hence the estimated expectation of life at the age of 16 years 
by using the given data is 3177 years. 
Example 165. The following results are given : 
4/27 —30000, — 4/ 28—3:0369 
4/29 =3:0727, — 4/30 —31074 


Using them, find 9/26 . : 
U.C.W.A. (Intermediate), December 1981] 


Solution. Геї у: = $ x. Then ме аге given the values of 
Vor» Узв. Yao and узо and we want у. Since the values are given at 
чп intervals we can apply Newton’s forward interpolation 
ormula. 


838 Buginess Statistics 


TABLE OF DIFFERENCES 


А 


х Ув Ауа A’. А?ув 
27 3:0000 
0:0369 
28 3:0369 —0:0011 
\ 0:0358 0 

29 3:0727 —0:0011 

> 0:0347 
30 3:1074 


eee 


Newton’s formula gives : 


ye=yotubyet чи) Atyat Aaa 
NO 


where u—(z—a)/h ; x is the value of the argument for which ys is 
required, a is the 1st argument in the difference table and л is the 
common interval of differencing. 


26-2] . 
ToL 


Substituting in (*) and using the difference table above, we 


we u 


—1. 


Bet: 
ун=30000—1 x 0°0369-++ cx» x(—00011) 


—:3:0000 —0:0369— 0:0011 
=2'962 


: Example 1676. Using an appropriate formula for interpolation, 
estimate the number of students who obtained less than 45 marks from 
the following : | 

Marks : 0-40 40—50 50—60 60—70 70—80 
No. of students: 31 42 51 35 31 

(Bangalore Uni. B. Com., April 1978) 
Solution. Here we define : 


Yz: Number of students who obtained less than x marks, i.e., 
ys gives the ‘fess than’ cumulative frequency (c. T 


The ‘less than’ cumulative frequency and the difference table 
are given on page 839 
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TABLE OF DIFFERENCES 


z РА A * 4 
(less than c.f.) ze 524 Аз» А» 

40 31 31 

42 
50 42 13 9 

51 zu 
6 51 124 —16 25 37 
70 35 159 27 E 

31 
80 31 190 


. We want у. Since the value to be estimated is in the begin 
ning of the table, we use Newton's forward difference formula given 


by 


ye yetuAyot wed ary, +e Ayo 
e(t) 

Here x=45, a=40, h=10 

so es. 4540 coe 

MEET ET CUN =0°5 


Substituting in (*) and using the difference table we get : 
yu 314 (0:5) 424--3X73-D.—. gj 
т (7305052729 + (05x03-103—203-3) 31 


231-4-21—1:125—1:5625— 14453125 
7478671885548 
Hence estimated no. of students getting less than 45 marks 
is 48. 


Example 1677. From the following table, find the number of 
workers falling in the earning group of Rs. 25 to Rs. 35 : 


Earnings in rupees No. of workers 


up to 10 50 
vod 150 
130, 300 
» 2» 40 500 
1940190 700 
„ „60 800 
[Rajasthan Uni, М.А. (Econ.), Jan. 1976) 
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Solution. Here the function уз is defined аз: 
yz=Number of workers earning up to x rupees. 


TABLE OF DIFFERENCES 


We want to find the value (yss—yss). 


Since the values to be eastimated are in the beginning of the 
table, we shall use Newton’s forward difference formula to estimate 
Yas and узу. 


To find у, we have : 

x—a _ 25—10 
7, Д0 

357 504-1*5(100) - зз x (50)4- 303-1033) x(0) 


=1'5 


ig SES DUS 2053) х(—50) 


+ 259050015205 —3)(1'5—4) х0 


7:50--1504-18:754-0—1:171875--0 
—217:578132x218 
Hence the number of workers earning up to rupees 25 is 218. 
Similarly to find Ys» We have 


_ 35-10 , 
S10 = 


J557750-- (275)100 4- чеш 50 


ES 25073-1032) x04 2°5(2'5— D95-203-3 x(—50) 


2:5(275—12:5—2)(2:5—3)2:5—4) 
Tue aid 
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=50+250+95'75+0+1°953125-+0 
=395°70313~396 
Hence the number of workers falling in the earning group of 
Rs. 25 to Rs. 35 is : 
Jss—J32577 396—218 —178 
16:8. Newton's Backward Difference Formula. This formula is 
based on the backward differences V and is specially useful if the 
estimated value lies towards the end of the difference table. lt uses 
the leading backward differences of the last entry in the table and 
is given by the formula : 
u(ud-1) 


Ло) =а+п) иу fla+nh) 3-7, N/*f(a- nh) 
+ et DO). V?*f(a--nh)4-... ses (1614) 


where a--nh is the last argument іп the difference table ; V, V’, 
are the leading backward differences of the last entry (and are 
given by the last diagonal of the backward difference table) and 
zs Period of interpolation— Last argument 
Interval of Differencing 


„=н (16'15) 


Example 16:8. The following table gives the census population 
of a town for the years 1931 to 1971. Estimate the population for 
the year 1965 by using an appropriate interpolation formula. 

Year 22057937 1941 1951 1961 1971 

Population (000) : 46 66 81 93 101 

Solution. Since the value to be interpolated lies towards the 
end of the given data, we shall use Newton's backward difference 
interpolation formula. 


TABLE OF BACKWARD DIFFERENCES 


x Дх) VAE) V?f(x) Vf) ух) 

1931 46 
20 

1941 66 225 
15 2 

1951 81 —3 -3 
12 -1 

1961 93 E —4 

1971 101 
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In the notations of formulae (16°14) and (16°15) we have 


_ 1965—1971 
4 10 


The leading backward differences of last entry are given respect- 
ively by 8, —4, —1 and —3. Substituting in (20°14) we get 


f(1965)—101— (6) x 8+ LIND, (4) 
р (—0`6)(—0`6--1)(—0`6-Е2) 
C RI АНЕ 


=—0°6 


x(—1) 
(—0:6)(—0:64-1)(—0764-2)(—0:64-3) 
^ 4! 
—101 — 4*8000--0:4800-1-00560-1-0:1008 
7101:6368— 4'8000— 96:837 thousands 


x(—3) 


EXERCISE 161 


1. What do you understand by interpolation? What are the underlying 
assumptions for the validity of tbe various methods used for interpolation ? 


k 2. (a) What do you understapd by ‘Interpolation’. Show clearly neces- 
sity of interpolation by taking a few concrete examples. 


á (b) Givethe assumptions and importance of the method of interpola- 
on. 

[Punjabi Uni. M.A. (Econ.), 1983] 
(c) What is the utility of interpolation and extrapolation to business- 


man ? Mention the chief methods of interpolation giving the conditions under 
which they are suitable. 


(Kurukshetra Uni. B.Com. II, Sept., 1982) 
3. What do you understand by the terms interpolation and extrapola- 
tion ? Discuss briefly their necessity and usefulness in statistical studies. 
[Himachal Pradesh Uni. М.А. (Econ.), Feb. 1982] 
4, Discuss the uses of the technique of interpolation in solving most of 
the economic problems. 
[Himachal Pradesh Uni. M.A. (Econ.), Feb. 1983] . 
3 5. (a) Explain the terms ‘argument’ and ‘entry’ as used in interpola- 
tion. 
(b) Define the difference operators A and E and show that 1+ AcE. 


6. State Newton’s interpolation formula for equal intervals and the 
assumptions underlying it. 


7. State Newton’s formula for interpolation and discuss some of its 
uses. Explain why Newton’s formula is to be used for interpolating values at 
the top of the tabie. 


8. Define the difference operators A and Л and state Newton's 
Forward Difference and Backward Difference formulae Explain clearly : (i) the 
situations where these formulae can be used, and (ii) the assumptions involved. 
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9. Explain the terms interpolation and extrapolation. Describe 

(i) Graphic Method, 

(ii) Algebraic Method, 
of interpolation and discuss their relative merits and demerits, 

10, The following table gives the values of a certain function y=f(x) for 
some equidistant values of x : 

x 5 14 20 26 32 38 44 

y ой 110 192 308 464 666 920 

Find graphically (i) the value of ywhen x=40 and (ii) the value of x 


when y=400. 
[I.C.W.A. (Final), July 1972, (О.$.)] 
Ans. (0) у= 730 (ii) x30. і 
11. The following data relate to the amounts of income-tax paid by 600 
businessmen during a year : 


More than More than More than More than Моге than More than 
Rs. 500 Rs. 1,000 Rs. 1,500 Rs, 2,000. Rs. 2,500 Rs. 3,000 


No. of businessmen : 
600 550 425 275 100 25 


Find the number of businessmen who paid more than Rs. 1,200 but not 
more than Rs. 2,400 as income-tax. Use graphic method. 


Ans, 370 (approx.). 


12. Explain the ‘parabolic curve fitting’ method of interpolation. 


The following table shows the values of an immediate life annuity for 
every £ 100 paid :— 


Age in years : 40 50 60 70 
Annuity (£) : 6'2 722 91 120 
Interpolate the annuity for the age 42 by parabolic curve method. 
Ans. £ 6:333. 


13. State Newton's formulafor interpolation for equal intervals and 
the assumptions underlying it. Use it to find out the annual net premium 
payable at the age of 25 from the table given below : 


Age: 20 24 28 32 

Net 
аа ААХ : 0°01427 0°01581 0:01771 0:01996 
Ans. 001625 [.C.W.A. (Intermediate), December 1981] 


14. Explain the meaning of interpolation. The following table gives the 
expectation of life at different ages, Find the expectation of life at the age of 


49 years. 
Age(Years) : 35 45 55 65 75 
Expectation 
i DH 34 26 18 12 10 
oean COMM [Himachal Pradesh Uni. M.A. (Econ.), July 1984] 


Ans, 22:688 years 


M 
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LÍ - : iste 
15. If Lẹ represents the numbers living at age xina life table, im 
polate by using ейп method Lg for the values of x=24 and x 229. 


—512 ; 10—439 ; L,,346 ; L,4—243. 
= EF (Rajasthan Uni. M.Com. 1977) 


Ans. L3,—486 ; L,,—447. 

16. State Newton's forward interpolation formula and use it to obtain 
N55 given : 

4/5—2:236, 4/6—2:449, /7=2'646, V 8=2:828. 


U.C. W.A. (Intermediate), June 1974] 
Ans. 2:345. 


17. Given : 
sin45* —0:7071, sin50*—0:7660, sin55*—0:8192, sin60^— 0:8660, find 
sin52°, by using any method of interpolation. 
Ans. 07711. 


18. From the data given below estimate the number of candidates who 
get marks more than 48 but not more than 60. 


Marks No. of candidates 
36—45 8 
46—55 10 
56—65 8 
66—75 6 
76—85 4 


[Punjabi Uni. M.A. (Econ.), 1982) 
Ans. 27—20=7. 


19. Using Newton's method of interpolation, find from the data given 
below, the number of persons in the income group between Rs. 20 and Rs. 25. 


Income No. of persons 
Below rupees 10 20 
„ „ 20 45 
7 AE 115 
» » 40 210 
5 sn KI 325 


[Punjabi Uni. M.A. (Economics), 1981 ; 


Punjab Uni. M.A. (Economics), 1980] 
Ans. 76—45=31. 


20. Estimate the number of candidates who get more than 48 but not 
more than 50 marks from the following : 


Marks up to 22-245 50 55 60 65 
Мо. of candidates : 447 484 505 511 514 
Ans. 13. 


Interpolation and Extrapolation 845 


. 21. The following ave the marks obtained by 492 candidates in a certain 
examination : 


Not mote than 40 marks, 212 candidates, 
n » w 4S , 26 ,, 


#», „эк ку д, уй А9232 


(a) Find the number of candidates who secured more than 42 but not 
more than 45 marks. 


[Punjab Uni. M.A. (Econ.), 1980 ; Punjabi Uni. M.A. (Economics), 1979] 

(6) Find the number of candidates who secured : 

(i) more than 48 but not more than 50 marks. 

(il) less than 48 but not less than 45 marks. 

Ans. (a) 296—256—40 ; (b) (i) 368—332=36, (ii) 332—296—36. 

22. State Newton’s backward difference formula, explaining clearly the 
assumptions involved. Under what situations do you recommend its use ? 

23. What do you understand by interpolation ? Estimate the number 
of students for 1953 from the data given below. 

Year : 1948 1950 1952 1954 


No. of students : 50 79 102 113 
(Guru Nanak Dev Uni. B.Com. II, Sept. 1983) 


Ans. (By Newton's Backward formula) : 109. 


24. The population of a district for different years is given below. Find 
but the population for 1982 : 


Year 1977 1978 1979 1980 1981 
Population 7 2 36 14 16 
Coen (Guru Nanak Dev Uni. B.Com. II, April 1982) 


Ans. 297 (Backward difference formula.) 


25. From the following figures find the premium payable at the age 


of 40 : 
Аве (in year) : 20 25 30 35 
| Annual Premium 
(in Rs.) : 28 3125 35 41 
Апу. Ёз. 51. (Kurukshetra Uni. B.Com., 1980) 


26. State Newton’s backward interpolation formula with asaumptions. 
Estimate by Newton’s method of interpolation, the expectation of life 
at age 32 from the following data. , 
Age 2010 15 20 25 30 35 
r Expectation of life : — 353 324 29:2 261 232 20:5 
U.C.W.A. (Intermediate), June 1984] 


Ans. 22:0948. 
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- 27. From the following data, estimate number of persons earning wages 
between Rs, 25 and 35. 


Wages in Rs. No. of persons 
up to 10 50 
» 20 150 
» 30 300 
» 40 500 
» 50 700 


(Guru Nanak Dev Uni. B.Com., 1980) 
Ans. Y29=218 (Newton’s forward formula.) 
Yas=396 (Newton’s backward formula.) 
Yas—Y25=396—218=178. 


169. Binomial Expansion Method for Interpolating Missing 
Values. Sometimes we may be given data in which the values of 
the independent variable x are at equal intervals but one, two or 
more values of the dependent variable (entries) may be missing. 
These missing values can be easily interpolated by using the 
following results of the calculus of finite differences. 


(i) Suppose we are given (n+1) equi-distant arguments but 
the entry corresponding to any one of them is missing. Thus we 
are given n entries and hence we can express the function y=/(x) by 
a polynominal of (n— 1) th degree. 


. . Gi) By fundamental theorem of 
1$ a polynomial of (п—1)їһ degre 
are constant, and nth 


finite differences, since y—f(x) 
€, (n—1)th order differences 
and higher order differences are zero. 


Symbolically, 
A" ¥f(x)=Constant 
> A"f(z)--0, мх » (16716) 
In particular taking x =a (the first argument), we get : 
Atf(a)=0 (16:162) 
> (E—1)"f(a)=0 [From (16:10)] 


Expanding by binomial theorem, we get 
[E^ —"C,En1- 7C, pns + +(—1)"}f(a)=0 
> [E"f(a) —C,E"3f (a) +"C,E"*f(a) +... (—1)"f(a)=0 
Hence, using (16:9) we get 
Жа+пв)—"С, f(a--n—1 h)+"C, f(a--n —2 A) 4... 


Tee 105f(a-0  ..(06:17) 
From this equation the missing value can be interpolated. 


— 


Interpolation and Extrapolation 847 


If we are given (n-+2) equidistant arguments and two entries 
are missing, i.e., as before n entries are given then arguing as 
above we shall get 


A"f(x)—0, vx -- (16:18) 

Since two values are missing, we need two equations to estimate 
these values. Taking x=a and a+h in (20°18) we get respectively : 

A"f(a)=0 and Atf(ath)=0 (1619) 

27 (E—i)"f(a)=0 and (E—1)"f(a+h)=0 -.. (16:193) 

Expanding equations (167192) by binomial theorem and using 


(169), we finally get the estimates of missing values by solving 
these equations. The following examples will clarify the technique. 


Example 16'9. Estimate u, from the following table : 
x T 2 3 4 3 
u(x) 20000 * 2°0646 2:0954 2:1253 


State the necessary assumptions made. 
[.C.W.A. (Intermediate), Dec. 1980]. 


Solution. Since we are given four entries, u(x) may be re- 
garded as a polynomial of 3rd degree so that 


Aus=constant 


> A‘*uz=0, for all x ў Reel) 
In particular, A‘u,=0 [Taking z—1 in (*)] 
= > (Е—)1Ф%и,=0 

= (E*—AE* --6E* —AE4-1)u, 0 з 

= ug—4us + 61g —4 utp +t = 0 [Using (16'9а)] 
= 31253 —4% 2'0954-+6X 20646 —4u; 1-270 —0 

= 2:1253— 8:3816--12:3876— 4u, 1-270 —0 

” 4и,==2'1253-+Е+12`3876-Е20—8`3816 

= 4152165129 —8°3816=8' 1313 

= = 81313 20328 


Example 16:10. The following table gives the quantity of 
cement in thousands of tonnes manufactured each year. Find the 
missing term by a suitable algebraic method of interpolation. 


Year: 1962 1964 1966 1968 1970 1972 
Cement 
ua tity: 44 90 D 160 270 390 


(Madras Uni. B. Com., April 1976) 


Solution. Taking the year 1962 as origin, we are given : 


куж 
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2 3 4 5 


5-90 y=? у=10  y,-270 ys = 390 


Since five entries are given, we can assume yz to bea polyno- 
mial of 4th degree, so that tourth order differences of yz are constant 
and fifth order differences are Zero, Le., 


A5ys—0 for all values of x. Ы 
In particular, А5уо=0 
> (E—1)5y=0 
> (E5—5E*-- 10E*— 10 E? --5E— 1)»=0 
= Ys— Sat 10y,— 10y; 4- 5yi—yo=0 (© 
Substituting the values of у, y, Уз» Yı» Yo, in (**), we get 
390—5 x 270-- 10x 160—10y,4-5x 90—44=0 

> 10y, —1046 > ye=104'6 
Р Hence the estimated quantity of cement manufactured in 1966 
is 104°6 thousand tonnes. 

_ Aliter. Let the missing observation be “т. We have the 
difference table as given below : 

TABLE OF DIFFERENCES 
х ут Aye Aya AYe A*ys А*ув 

SS ÁÁ——Á— ÁN 

1962 44 

46 
1964 а—136 
a-90 386 —3a 
1966 a 250—2a 6a— 686 
160-а 3a —300 1046 — 10a 
1968 160 a—50 360—4a 
110 60—a 
1970 270 10 
120 
1972 390 
Since fifth order differences are zero [See (*)], we have : 
A= 
>  104—100—0 = 1046 


10 =104:6 
Hence missing observation is 104°6 (thousand tonnes). 
Example 1611. Using any appropriate inte; 


` estimate the percentage number of criminals under 
data given below : 


rpolation formula 
35 years from the 


\ 


Interpolation and Extrapolation 349 


Age Percentage No. of criminals 
Under 25 years 320 
» 30 ». 473 
» 40 Ж 641 
3: 45 E 69:3 
35 50 E] 74'5 


(Lucknow Uni. B. Com., 1982) 


Percentage No. of 
criminals (y) 


. Let the percentage number of criminals under 35 years be a. 
Since we are given 5 entries, yz can be approximated by a polyno- 
mial of 4th degree so that 


Дуг = Constant > A5ys—0, мх 09) 


DIFFERENCE TABLE 


x Ya Aye А?уа Дуг Дз Aya 


: 626 
Е rs F „ 9 E 
14- 344" 
39, ае NES 3a—170°3 512:5—10a 
40 в a—58 229:2—4a 
5 58:9— 
4 өз 
50 — 745 


Since ASye=0 [From (*)], we get 


72° 5 
511:5—10290. |. > a= 723 =57:25 


Hence the percentage number of criminals under 35 years 
is 57°25. 

Example 1612. Estimate уз from the following data : 

x 0 1 2 3 4 

yo I 3 9 = 81 
and explain why the value obtained is different from that obtained by 
putting x=3 in the expression 3*. 
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Solution. Since we are given four entries we can assume ye 
to be a polynomial of 3rd degree so that the fourth order differ- 
ences Of ye are zero. 


ok At 2=0, vx 
> A‘=0 -(*) 
Let the missing value be ‘a’. Then the difference table is 
given below: ^ TABLE OF DIFFERENCES 
es 
x Ya Aya Ayo Ays Atys 
S ааа 0 
0 1 ‹ 
2 
1 3 4 
6 a-19 
2 9 a—15 124—4a 
а—9 105 —3a 
3 а 90—2a 
81—a 
4 81 


Ў Substituting from this table in (*) we get : 


124—4a=0 = aeg 
Hence the missing value У3=31. 


Now putting x=3 іп 3", we have y,—33—27. Obviously this 
value differs from the estimated value, viz., 31. The reason for it is 
that in estimating y, we assumed that y» can be expressed as a 
polynomial of third degree and hence fourth order differences are 
Zero. But in this case (y=3*), the function is not a polynomial but 
is of exponential form. Since the basic assumption of interpolation 


is оше; We get the difference between actual and estimated 
values. 


Example 1613. From the following table, 
missing values. 


Year 0 1 2 3 4 5 6 


Production 200 220 260 ? 350 ? 430 
Gin 000 tonnes) 


interpolate ihe 


(Kurukshetra Uni. B. Com., 1981) 


Solution. Taking years as z and production (in ’000 tonnes) as 
Je We are given : 
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_ Since we are given five entries, we can assume that production 
(ys) is a polynomial of 4th degree so that fourth order differences. 
of ys are constant and consequently fifth order differences are zero.. 
ie, A5ys=0 for all values of z. ` 


= (E—1) уз=0, xz (*) 
Since we have to estimate two unknowns viz. s Ya and ys, we 


need two equations. Hence taking x—0 and x=1 in(*) we get 
respectively 


(E—1)5—0 
Mn (E—1)y,—0 
Now (E— 1)5y,—0 
= (ES—SE*--10E*— 10E*--57E— 1)y,—0 
=> Ys— Sya t 10уз— 10y, 4-5y, —,—0 
ie,  Ys—5X350-+10y,—10X260+5x 220—200=0, 
= yy -10y, 3450 0] 
Similarly (E—1)*y,=0 
- Yo—Syst+ 10y4— 10y,4-5y4—),—0 
= 430—5y,+3500—10y,+260 x 5—220—0 
> 5У5—10у,=5010 ...(ii) 


(,,, We have to solve (i) and (ij) for уз and ys. Subtracting (i) from 
(11) we get 
4y,=5010—3450=1560 = y= 15 —390 


Substituting the value of у; in (i) we get 

390+ 10y,=3450 = = 209) 306 

Hence the estimated production figures in ('000 tonnes) for 
1955 and 1965 are respectively 306 and 390. 


Example 16°14. 
Given : ию=939,  uz,=907, 3557841, 15—773, 


estimate uy, and us, under suitable assumptions. 
| Solution. We are given :U.C-W-4. (Intermediate), Dec. 1984] 


x 30 5i 32 53 54 55 
us 939 7 907 841 7 UN 


Taking x —50 as origin, the above table becomes 
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Since we are given four entries, w may be regarded asa 
polynomial of 3rd degree so that : 

Auz constant > A‘ua=0 > (E—l)'us—0, «x ..(*) 

Since we have to estimate two unknowns, viZ., Uy and и, we 
need two equations. Hence taking x= 0 and x=] in (*) we get res- 
pectively : 


(E—1)tu,—0 and (E— Du; 2:0 e. (**) 

Now (E—1)* 1,—0 

> (E*—4E3+6E?—4E+ Tu, —0 

- Ug— 4и, биз —4u, 4-иу=0 

=> и, 4х 8414-6х907—4и,+939=0 

> us — 3364 —5442—4u,+939=0 

es Ў u 4m+6381—3364=0 

> из 4u,+3017=0 (0) 
Similarly (E—1)14,—0 

- (Е—14Е3 -6E* —AE4-1) u,—0 

- us 4u, + биз — Aug T-u 0 

> 773 —4u, +6 x 811—4 x 907--u, =0 

= T13— 4u,+5046—3628-+u,=0 

> 15 —4u,+5819—3628=0 

> ш —4u,3-2191—0 (ii) 


We have to solve (i) and (ii) for u and u4 From (ii) we 
get, 
1474u,—2191 E73] 
Substituting the value of u, in (i), we get 


u4—4(4u,—2191)4-3017—0 


z ua— 1614+8764+3017=0 
= —15u,-+11781=0 
E u- IUS = 785400785 


Substituting the value of и, in (iii) we get 
14 —4X 785'4—2191=950°6= 95] 


EXERCISE 16:2 


1. Explain the Binomial Expansion method for interpolatiug the mis- 
sing observations, stating clearly the assumptions involved. 


2. Explain the use of the operators A and E in estimating (i) one and 
(ii) two, missing observations. State clearly the assumptions involved. 
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4. Interpolate the index number for 1970 from the following table : 


Years 1968 1969 1971 1972 
Index Number 100 107 157 212 
[Punjab Uni. B. Com. П, Sept. 1982) 
Ans. 124 
4. Obtain an estimate of the missing figure in the following table : 
х: 4 5 6 7 8 
fo: 3H 2:96 2 270 
[LC.W.A. (Intermediate), Dec., 1983] 
Ans. 285 


5, Find the missing values in the following : 
(Use Binomial method) 

к 2 3 4 5 6 7 

y s 5:99 T92 9:49 ? 12:59 14:07 
[Kurukshetra Uni. B. Com., 1978] 

Ans, 1102 

6. Find the missing value from the following figures by the Binomial 

Method of interpolation : 
Year : 1970 1971 1972 1973 1974 1975 
Value : 141 131 145 — 149 173 
[Kurukshetra Uni. B. Com., 1979] 
Ans. 150-8251 


7. By using the most suitable method, estimate the business done in 
1980 from the following data : 


Years : 1977 1978 1979 1981 1982 
Business done in lakhs Rs. : 1570 2350 3650 5250 7800 

(Punjab Uni. B. Com. II., April 1983} 
Ans. Rs. 4470 lakhs 


A 8. Interpolate the two missing figures with the help of a suitable for- 
mula. 


Years : 1950 1951 1952 1953 1954 1955 1956 
Production 76:6 787 1 ТТ Ash ? 80°6 


(in Millions) 
[Mysore Uni. B. Com., April 1982] 
Ans. 7809; 805 (Millions) 


9. Estimate the production for the year 1955 and 1965 with the help of 
following table : 


Year : 1940 1945 1950 1955 1960 1965 1970 
Production : 20 22 26 — 35 — 43 
(n tonnes? 


[Kurukshetra Uni. B. Com. II, April 1982] 
Ans. 35, 39 (tonnes) 


10. The number of members of International Statistical Society are : 
Year :1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 


No. of 
Members : 845 867 — 846 821 772 — 757 161 796 


Make the best estimate you can ofthe members in 1972 and 1976. 


[Punjab Uni. B. Com., 1980] 
Ans. 844, 746. 


Aci 
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11. Estimate U, from the following table : 
? tae 1 2 3 4 5 
Uie т 1 13 21 37 
and explain why the valme obtained is diff ft i i 
MS о MM o Is different from that obtained by putting 
Ans. U,=9°5 ; Actual valuezs9, 


1610. Interpolation with Arguments at Unequal Intervals. The 
techniques of interpolation discussed so far are applicable only if 
the values of the independent variable z are given at equal intervals. 
In other words, the Binomial Expansion Method and Newton’s 
Difference Formulae (Forward or Backward) can be used if we are 
given the entries corresponding to equidistant values. of the argu- 
ments. However. these formulae can not be used if the values of 
the arguments are given at unequal intervals, since in that case the 
operators E, A and WV do not serve our purpose. In such cases 
when the arguments are not equally spaced special techniques, given 
below, are used : 


(i) Newton's Divided Difference Formula, 
(ii) Lagrange's Formula, 


Remark. IJt should be clearly understood that these two 
methods can be used even if the arguments are equally spaced, 
though in that case the calculations may be slightly more as compar- 
ed to Newton's Forward or Backward Difference formula. In 
Practice these methods are usually used when the arguments are not 
at equal intervals, : 


1611. Divided Differences. As already pointed out, in case the 
arguments are not equally spaced the operators A, V and E can- 
not be used. In case of equally spaced arguments, while forming the 
difference table we considered only the differences between the 
Successive values of the entries, without paying any attention to the 
Corresponding difference between the arguments. The differences 
defined on taking into consideration the changes in the values of the 
arguments are known as divided differences. 


Let the values of the variable x (arguments) be ap, ау, ds)---,an 
which may not necessarily be equally spaced and let f(a,), /(а;),---, 
Гап) be the corresponding entries. Then the first order divided 
difference of f(x) for the arguments аз and a,, usually denoted by 
f (as, ау) or AA f (ap) is defined as : 
a 


A f(a) f (ay, а)= Lt f (16:20) 


41—4, 
Tn other words, divided difference AUag is nothing but ordinary 
а; 


Y 1 
difference AUas divided by the corresponding difference between 
the arguments. Thetable on page 856Bives the divided differences 
upto 4thorder in case of five entries corresponding to the argu- 
ments Qo, а, Gz, d$, and а. 


эу 
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Remark. If f(x) isa polynomial of nth degree then nth order 
öivided differences of f(x) are constant and higher order diffe- 
yences are zevo. Mathematically, 


А") = Constant s 
and Аа) =0, r>n } 1621) 
provided f(z) is polynomial of degree n. 

16111. Newton's Divided Difference Formula. Let Sao), 
Аа), flan) be n--1 entries corresponding to the arguments а, d, 
as.» аһ not necessarily equally spaced. Then Newton's divided 
difference (D.D.) formula gives the form of the function 
Хх) as : $ 

Кх) = (ao) +(x а) Af (ao) +(x —a40)(x— а) Afla) 
++(x—ag)(x—a,(x—as) А5 (ао) 
Ф. (x— ao) (x а): (хап л) АУ (0) ...(16'22) 


Remarks. 1. Theformula (16'22) has also been obtained under 
the assumptions discussed in 81611, viz., f(x) can be vepresented by 
a polynomial of appropriate degree, depending on the number of 
entries given. 

2. This formula can also be used even if the arguments are 
at equal intervals, though in practice, it is gencrally used when 
arguments are at unequal intervals. 

We discuss below some numerical problems to illustvate the 
use of this formula for interpolation. 


Example 16:15. The observed values of a function are respect- 
ively 168, 120, 72 and 63 atthe four positions 3, 7, 9 and 10 of the 
` independent variable. What is the best estimate you can give for the 
value of the function at the position 6 of the independent variable ? 


Solution. DIVIDED DIFFERENCE TABLE 


168 


3 
Iioc 
7 
72-10 4 5—(-2) _, 
i 9—7 10-3 


63—72 __% 


10-9 
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$p —*p 
(в) у= о 
(#0) — Coy 
v= Typ (LV = (89) AU Yar, Sp. fp 
io DAY = EVRY =a CMV = тууру 
T D—*p 

Q2) V — 0e к (ОЛУ = (yy — IT ина 

(0 —(0)7/ 


0p tp 


(0) у — (10) у 


(CLV = 


00—10 


У оуу 


(о)/ 


ATVI чомячаяяаа аяала 


*p 


12 
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Hence by Newton’s divided difference formula (2022), we get 
f(x) =168+(x—3) x (— 12)-(x—3)x—7)x (—2) 
+(2—3)(x—T)(x-9) X 1 
=x8—21x2+119x—27 [On simplification] 
3 The estimated value of the function at the point =6 is given 


f(6)26 —21x @+119X 6—21—216—156--114—21—147 


Example 1616. Apply Newton's Divided Difference Method 
to find the number of persons getting Rs. 6 from the foll wing data. 


Income perday : 3 5 7 8 10 
No. of persons: 180 154 120 110 90 ^" 
Solution. 


DIVIDED DIFFERENCE TABLE 


x fx) As Ax) AS) AY) 
3 180 
-13 
5 154 EY 
=17 0:67 
7 120 2:33 —0116 
—10 0:47 
8 110 0 
—10 
10 90 


Using Newton's divided difference formula, we have : 
f(x) =180+(2—3)X (—13)+(х—3)(х—5)Х (—1) 
+(х—3)(х— 5)(x—7) x067 
T(x—3)06— 5y(x—7)(x—8) x (—0°16) 
Putting x—6, we get : 
4(6)=180-+(6—3) x (—13)--(6—3)(6— 5) x C- 1) 
+(6—3)(6—5)(6—7) x (0767) 
--(6—3)(6— 5)(6—7Y(6 — 8) x (—016) 


=135 [On simplification] 


Hence the number of persons getting Rs. 6 per day is 135. 
1612. Lagrange's formula. Another formula used to inter. 
polate values when the arguments are not necessarily equally 


spaced was given by French mathematician Lagrange and is known 
as Lagrange's interpolation formula. 
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Tf ao, ау, ds, «++, An are (n+1) arguments not necessarily at 
equal intervals and / (а), f(a;), ---, f(a») are the corresponding ent- 
ries, then under the assumptions stated in $ 16°1°1, the Lagrange’s 
form of function f(x) is given by : 


(x—a) (x—a3) ... (x—a«) x 
f= (а,—ау) (ag— a3) --. (aa—an) Га) 
(х— a4) (х—а;) ... (х— an) У 
(a4 —а,) (а; —аз) ... (а,—а») Маз) 
T kes se > ose 
(х—а) (x—ay) .. (х— an1) 
(an—a,) (an—4,) ... (аз—ав_у) 
Remark. While dealing with arguments not equally spaced 
we can use any one of the formulae : 


(i) Newton's Divided Difference Formula, 
(ii) Lagrange's Formula, 


for determining the form of the function f(x) and hence for estimat- 
ing the value of f(x) for given value of х. Both these formulae can 
be used even in the case of equally spaced arguments, though in 
that case Newton’s (forward and backward) difference formulae 
are much easier to apply. 


Xf(an) ---(16°23) 


We discuss below numerical problems to illustrate the use of 
Lagr ange’s interpolation formula. 
Example 16°17. Ріпа the polynomial function f(x) given that 
S(0)=2, f)—3, f(2)—12 and f(3)=35. Hence find f(5). 
U.C.W.A. (Intermediate), June 1984] 


Solution. By Lagrange’s formula of interpolation for four 
arguments 0, 1, 2 and 3, we get : 


= 2002003) о, | (X=ONx=2(x—3)_ 
Л) = O=7 0-3) O+ (70-305; 70) 


a узу dt (x—0)x— D(x—2) 


(2—0)9-1)23) '8-96-06-2) 7/9) 
2-1) х—2)(х—3) x(x—2)(x—3) 


(-Dx(-2x(-3; *? + труху X? 
x(x—1)x—3) x(x—1)(x—2) 
TOxix(-1 002 + Spc x35 
1 
--i( 335 —6x?4-11x—6 ) +3( 33—531-L6x ) 


—6( ete) 5 ( =з) 
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1 3 35 15 
aA 13 es(2-5 Hu)» 


«(73 49-184 5) 
ES —36 = = 
=( 24-9 36439) s4 e Los 35 уг 


+( =и+ SM je 
> fG)mxtrS-—xr2 ...(%) 


Taking x 25 in (*) we get : 
f(S) 534-51 5+2=125+-25— 54-2147 
Example 1618. Given 10810654=2'8156, 10810658 =2'8182, 
1080659 —2:8189, 10810661=2'8202, find by Lagrange’s interpolation 
formula 108656. [Retain four decimal places in your answer.) 
U.C.W.A. (Intermediate), June 1978] 


Solution. Here f (x)=logiox- The values of arguments and 
the corresponding entries are as given below ; 
x Б 4,—654 a, = 858 457659 454—661 


fo)-logo* : 2:8156 2/8182 278189 28202 


We want f(x) when x=656, viz., f (656)=10 21656. 
Taking x= 656 in Lagrange’s formula (16:23), and using the 
above table we get 
log,,656—/(656) _ : 
(656—658) (656—659) (656—661). 28156 
(654— 658) (654—659) (654—661) 
(656—654) (656—659) (656—661), „. 
-- 4658 —654) (658—659) (638—661) ^ 2157 
(656—654) (656—658) (656—661) 
(656—654) (656—658) (656—661) 7. 
+ (659—654) (659—658) (659 — 661) 2 2: 
(656—654) (656—658) (656—659) 
(656—654) (656—658) (656—659)... 
+ (661—654) (661—658) (661—659) x 28202 
(—2) (—3) (—5) 5. 2(—3) (—5)„- 
= ——— 0. 1 
Sent + ao 
2—2 C5 .. 2-2) C43) 5. 
Pe ey? 8202 
= ч x2 8156+ x 28182—2x 28189--2-X 28202 
__0:6033-4-7:0455—5:63784-0:8058 
—2:9169 
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Remark. If we are asked to find the value of a function f(z) 
at some point by using Lagrange’s formula, then it is not necessary 
to simplify the function f(x) as a polynomial in x unless we are 
asked to find the form of the function f(x). The required result is 
obtained on substituting the value of x in the formula (16°23). 


Example 1619. By using Lagrange’s method, estimate the 
number of persons whose income, is Rs. 19 and more but does not 
exceed Rs. 25 from the following table : 


Income in Rs. No. of persons 
1 and not exceeding 9 50 
10 and not exceeding 19 70 
19 and not exceeding 28 203 
28 and not exceeding 37 406 


37 and not exceeding 46 304 
(Rajasthan Uni. M. Com., 1976) 


Solution. Let us denote the number of persons earning 
below Rs. x by Us. Then the given data can be transformed into 
less than cumulative frequency as follows : 


a,=19 a,;=37 а=46 
сист сын Mp =з. ы 
Vag=50 — Ua120  Ua7323  Uag=729 Ua,—1033 


Using Lagrange's formula for five arguments Za. ar, ds, ds, Ag 
we get, on taking x—25: 
— (25—19) (25—28) (25—37) (25—46) 
Па 0019) (9—08) (0237) 0946) X9 
(25— 9) (25—28) (25—37) (25—46) 
+99) 01928701937) (19—46) 120 
025—9) (25—19) (25—37) (25—46) 323 
(28—9) (28—19) (28 —37) (28—46) 
p 25—9) Q5 ~19) (25—28) (25—46) 729 
(37—9) (37—19) (37—28) (37—46) 
(25—9) (25—19) (25—28) (25—37) 
+ (46—9) (46—19) (46—28) (46 37) 1033 
— 6.(—3).(—12) (—21) 16.(—3).(—12).(—21) 
a е з с Ee АНИК ALLY 
a (=10).(—19) (— 28).(—37) der 10.(— 9).(—18).(—27) 
16.6(—12).(—21) |, 16.6.(—3) (—21) 


HaC (3 + 28.18.9.(—9) 


16.6.(—3).(—12) 
TANE * 1033 


x 120 


x 729 
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——1:1522--33:1852--282:0741 — 108-+-22'0594 
=228`1665= 228 
Required number of persons whose income is Rs. 19 and 
more, but does not exceed Rs. 25 is given by 
Uys —U,,—228 — 120= 108 
16:13. Inverse Interpolation. So far we were given a set 
of values of x (arguments) and the corresponding values of y=f(x), 


(entries) and we were required to estimate y=f(x) for some specified 
value of x. Let us now consider the reverse problem stated below : 


“Given a set of values of x and y=f(x), we are interested to 
find the value of x for a certain value of y”. This is termed as 
inverse interpolation. 


Е The formula for inverse interpolation is obtained from Lag- 
range’s interpolation formula by interchanging the variables x and 


y=f(x). 
Thus for four arguments a, a, ds, and a, the 
given value of f(x) is given by the formula : 
Bas 1502-7 (аро) Ха) -Aa 
Та) Ла Ла) Даа) —ftay] °° 


4 LAO Ла Яда) Ла y q 
Та) Хаа) Хауа) 780) * 

4 Сда) Лай 76077 (293 yea, 
TF la) Ха) Flas) Fa Lf(a)— Flan 

аА ЛА x 
а) 7 аад 7 (ай ^(а)—/(] ^" 


Example 16°20. The values of x and y=f(x) are given below : 
ETE 5 6 9 1 
Дх): 12 13 14 16 
Find the value of x when f(x)—15. 
Solution. In the usual notations of Lagrange's formula : 
а,=9 а3=11 
14 16 


(1624) 


eH a=S5 а=6 
у= х) : 12 13 
We want x when f(x)—15. 
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Using inverse interpolation formula ( 1624), we get : 


„_(45—13)(15—14)(15—16) х5 
(12—13)(12—14)(12—16) 
акон э 
Мста 
== 
“ENDO E 
=> 6427 +H. тиип 


EXERCISE 16°3 


. 1. Explain the interpolation methods used for interpolating the values 
of the dependent variable (entries) when the values cf the independent variable 
(arguments) are not at equal intervals. 

,, 2, What do you understand by a divided difference ? Complete the 
divided difference table for four arguments дв, A1, 0s, аз and the corresponding 
entries f(ao), f(a:), Даз) and f(a). 

3. State Newton’s divided difference formula and give: 
(i) the assumptions on which it is based > (ii) its uses, 
ы К i ү курсшш = he domes рова degree which assumes 
values 3, » 2-21 Wien x has the value 3, 2, 1, —1 respectively b; it 
Newton’s divided difference formula, Ў SEE ыыы 


Ans, f(x) x!—9x*--17x--6 


5. Construct a divided difference table for the following : 

x E 1 2 4 7 12 
JUL 22 30 82 106 216 
Ans. Leading D.D. are : 8, 6, —1:6, 0:194. 


6. By means of Newton's divided difference formula find the values of 
Д8) and /(15) from the following table : 


xoi 4 5 7 10 и 13 
Ko: 4% 100 294 900 1210 2028 
Ans. 448, 3150 


7. State Lagrange’s formula of interpolation and give its assumptions 
and uses. 


` 
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8. Use Lagrange's interpolati: P 
the following table. 8 pitormmuls to ind сие к= еа 


x 8 —1 —2 2 4 

ЛЭ -1 -9 11 69 
(1L.C.W.A. (Intermediate), Dec. 1979] 
Ans. f(0)=1 
9. Determine the percentage of criminals under 35 years of азе. 
Age Percentage of criminals 

Under 25 years 52:0 

Жл. 673 

» 4, 841 


94:4 
[Punjab Uni. B.Com. 1973 ; Nagpur Uni, B.Com. 1974] 
Ans. 7740596 
10. Given logis 654—2:8156 ; Іов:о 658—2:8182 
10210 659—2:8189 ; logi, 661-2:8202 
Find logi, 656 using two different interpolation formulae available for 


observations at unequal intervals, say, Lagrange’s formula and Newton’s 
formula for divided differences. : 


Ans. 2:8168 (By both methods). 


| 11. State Lagrange’s interpolation formula. Given the table of 
values : 
XLI 35:0 35-5 39:5 40:5 
fixe) et 1175 1280 2180 2020, 


obtain a value of (40). 
[.С.Ў/.А. (Intermediate), Dec. 1983] 


Ans. 2136 


12. State Lagrange’s interpolation formula and mention its difference 
in use from the Newton’s interpolation formula. 


Given the table of values : 


| x z 1:40 1:60 1:70 1:80 
| Sx) 0:9855 0:9995 0:9917 0:9737 
| duc U.C. WA. (Intermediate), Dec. 1984) 
Ans. 0:9840 
13. The following table gives the normal weight of a baby during the 
first six months of life : 
Age in months : 0 2 3 5 6 
| Weight in lbs : 5 te 8 10 12 
Estimate the weight of the baby at the age of 4 months, 


| U.C 9.4. (Final), Jan. 1970 (0.5.)] 
| Ans. 8:89 lbs [By Lagrange's method]. 

| 14. Using the Lagrange’s formula of interpolation find 4rom the data 
І given below the number of workers earning between Rs. 30 and Rs. 40: 

Earnings in Rs. : 15—20, 20—30, 30—45, 45—55, 55—70 


No. of workers : 73 97 110 180 140 5 
(Guru Nanak Dev Uni. B.Com., 1981) 27 
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15. The observed values of a function are respectively 168, 120, 72 
and 63 at the four positions, 3, 7, 9 and 10 of the independent variable. 
What is the estimate you can give for the value of the function at the posi- 
tion 6 of the independent variable? Use Lagrange’s formula. 


Ans. 147. 
16. What do you understand b; 


$ y Inverse Interpolation ? Explain how 
Lagrange’s interpolation formula can be used in this respect, 


17. A function f(x) takes the values as given in the following table : 
x ri 1 3 4 

Лх) _; 4 12 19 

Find the value of x so that Д\х\=7. 

Ans. 1°86, 


—— 
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TABLE П 
ANTI-LOGARITHMS 


ien eH Ee aia дынын. мшш мшш papa da  EOOO © 


PSO ооо 6060006000 COCO ONNO ю›юююю ююююю ююююю Miki нынын ы 


Ое оС поо do t0 толоо ово ОБО ыд ае аллаа ышык мююшы ы 


нын ыыы 


868 Business Statistics 


TABLE II 
ANTI-LOGARITHMS 
"TIREE 
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TABLE III 


POWERS, ROOTS AND RECIPROCALS 


14-1421 
14-4914 


15-8114 
16-1245 


0-15617 
0:15430 
0.15250 
0:15078 
0:14997 


0-14744 
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TABLE HI | 
POWERS, ROOTS AND RECIPROCALS 
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TABLE IV—BINOMIAL COEFFICIENTS 


X XXX) GG) GJ ( 


1 

1 2 

1 3 1 

1 4 6 4 1 

1 5 10 10 5 1 

1 6 15.720 15 6 1 

1 Д): GOSS, 521 7 1 

1 8 28 56 70 56 28 8 1 

1 9 36. 84 126 126 84 36 9 1 

1 10 45 120 210 252 210 120 45 10 1 
1 11 55 165 330 462 462 330 165 55 1 
1 12 66 220 495 792 924 792 495 220 66 
1 13 78 286 715 1287 1716 1716 1287 715 286 
1 14 91 364 1001 2002 3003 3432 3003 2002 1001 
1 15 105 455 1365 3003 5005 6435 6435 5005 3003 
1 16 120 560 1820 4368 8008 11440 12870 11440 8008 
1 17 136 683 2380 6188 12376 19448 24310 24310 19448 
1 18 153 816 3060 8568 18564 31824 43758 48620 43758 
1 19 171 969 3876 11628 27132 50388 75582 92378 92378 
1 4845 15504 38760 77520 125970 167960 


TABLE V—VALUES OF e™ 
(0c m«1) 


eococooccoo 
Фо лол Бом © 


"36788 13534 "04979 01832 ·00638 :002479 -00091 000335 -000153 *000045 


Note. To obtain values of e~™ for other values of m, use the laws of expo- 


nents. 


Example. &7235—(67299)(e7935)2 (713534) (7047) 095374 
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TABLE VI—AREAS UNDER NORMAL CURVE 
Normal probability curve is given by the equation 


p(x =e exp. f- ( = Jl. —0 <x<eo 


and standard normal probability curve is given by: 


= Ei — 2 ) — оо < оо 
#@= = exp. ( 22/2) |, <z 
where, 2-6 


The following table 
Bives the shaded area in 
the diagram viz,, 
P(0<Z<z) 
for different values of z, 


Х=р Х=х 
2-0 
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TABLE VII—SIGNIFICANT VALUES OF CHI-SQUARE 
DISTRIBUTION (RIGHT-TAIL AREAS) 
PLX2> Х2,(0)]= 


2 у 
X - Distribution 
curve 


Rejection 
region («] 


Degrees of 
freedom 
(0 


9900-30 a шю 


10 


Note. For degrees of freedom (v) greater than 30, the quantity 
У2Х#— /2v—1 may be regarded as a standard normal variate 
ie, Z=V 20 — 29-1 —N(0, 1) 


- 
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TABLE VIII—SIGNIFICANT VALUES OF £-DISTRIBUTION 
(TWO-TAIL AREAS). P(|t| >t)=a 


Acceptance 


Rejection region {1264 


region (%) 


Rejection 
region ( БА 


-ia(«) t=0 ty(«) 
d.f. Probability («) 

» |050 ТІ o1 [| oo | 00 Toor Т ooo 
"00 631 12771 31:82 63:66 636:62 

2 0:82 292 4-30 6:97 9:93 31:60 
3 0:77 2:35 3-18 454 5:84 12:94 
4 0:74 2:13 2718 375 460 8:61 
5 073 202 2:57 3:37 403 6:86 
6 0:72 1:94 2:45 3114 371 5:96 
1 071 1:90 2:37 3-00 3:50 5:41 
8 071 1-86 231 2.90 3:36 5:04 
9 0:70 1:83 2:26 2:82 3:25 478 
10 0:70 1-81 2:23 2776 3:17 459 
11 0.70 1:80 2:20 272 311 444 
12 0-70 178 2:18 2:68 3:05 432 
13 0:69 177 2:16 2:65 3:01 422 
14 0:69 1-76 2-15 2-62 2:98 414 
15 0:69 175 243 2:60 2:95 407 
16 0:69 1-75 2:12 2:58 2:92 402 
17 0-69 174 21 2:57 2:90 3:97 
18 0:69 173 2-10 2:55 2:88 3:92 
19 0:69 173 2:09 2:54 2:86 3:88 
20 0:69 173 2:09 2:53 2:85 3:85 
21 0:69 172 2:08 2:52 2:83 3:83 
22 0-69 172 2:07 2:51 2:82 3:79 
23 0-69 171 2:07 2:50 2:81 377 
24 0-69 1:71 2:06 2:49 2:80 375 
25 0-68 171 2:06 249 279 373 
26 0:68 171 2:06 2:48 2:78 371 
27 0:68 1-70 2:05 241 277 3-69 
28 0:68 1:70 2705 2°47 2776 3:67 
29 0-68 1:70 2:05 2:46 276 3:66 
30 0-68 1770 2-04 2:46 215 3:65 
= 0:67 1°65 1:96 233 2:58 3:29 


Remark. The significant values for one-tail test are obtain- 
ed on dividing the level of significance a in the above table by 2. 
[This is because of ‘symmetry property’ of the т distribution]. For 
example, for one-tail test the significant values at 5% and 196 level 
of significance will be given by the 2nd and 4th column respectively, 


quA an 
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TABLE IX A—1% SIGNIFICANT VALUES OF F—ez 
(VARIANCE RATIO) DISTRIBUTION, 
(RIGHT-TAIL AREAS) 


Y1—4d. f. for numerator ; va=d. f. for denominator. 


za 


5859 5982 6106 6234 6366 
9550 9909 9917 99°25 99°30 99°33 99°37 99:42 99°46 99:50 
3412 30°92 29°46 28-71 28:24 27:91 27°49 27°05 26:60 26:12 
21°20 13:00 16:69 1598 1552 1921 14°80 14°37 13:93 13°46 
1026 t327 1706 Ir39 10°97 10°67 10:29 989 ç47 go: 


1374 109: 978 gus $25 847 810 712 731 688 
1225 955 845 785 746 719 684 $47 60; 5:65 
1726 $65 7:59 yor 663 637 603 567 528 4:86 
1056 $o: 6-99 642 6:06 $80 47 sir 473 431 
!99« TSÓ 655 599 $64 $39 506 471 433 391 


965 720 6:22 567 532 507 474 фо до: 3'60 
933 693 595 $41 боб 482 yso 416 5578 336 7 
997 670 574 5:20 486 4-62 439 3:96 359 316 
$36 651 66 боз 469 446 414 38o 343 yoo 
968 636 $42 489 4-56 432 400 у67°329 287 


© зо очо ы ые 


æ э э э € 
Beane 


16] 853 623 S29 47? 444 439 389 355 318 275 
1171] 849 бі 518 46; $34 410 утә 345 зов 265 
18] 828 бог бод 458 425 фо уп 337 3:00 257 ) 


818 $93 бог дею 417 394 y63 узо 2-92 249 
B10 $85 494 443 410 387 y56 %23 286 242 


B02 578 48; 437 «94 881 зз 317 280 236 
794 572 48: 43! 399 376 345 312 275 ry 
788 566 476 426 394 371 зт SOF 270 226 
782 $61 фла 43? 390 367 336 $03 266 223 
T7) 557 4:68 418 386 36; 33? 299 262 217 


772 $53 464 gig y82 359 329 296 258 213 
T68 $49 460 gr 378 3°56 326 293 755 210 
764 545 457 407 375 353 F23 290 252 206 
760 $42 454 404 373 250 узо 287 249 203 
TS6 5°39 4°51 qo: 31° y4? уір x84 24; тот 


731 516 qur yS; уз 3:9 299 266 229 r8o i 
ToB 498 413 3-65 ЗЯ 312 r82 250 212 ro 
685 479 $95 3-48 317 296 266 254 ros r38 

`бо 376942 3:02 285 T5! 218 179 гоо 
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